49
Containerising Distributed Pipes a story about convenience Hagen Tönnies www.linkedin.com/in/hagen-toennies 1

Containerising Distributed Pipes · Unix Philosophy • Write programs that do one thing and do it well. • Write programs to work together. • Write programs to handle text streams,

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Containerising Distributed Pipes · Unix Philosophy • Write programs that do one thing and do it well. • Write programs to work together. • Write programs to handle text streams,

Containerising Distributed Pipes

a story about convenience

Hagen Tönnies www.linkedin.com/in/hagen-toennies

1

Page 2: Containerising Distributed Pipes · Unix Philosophy • Write programs that do one thing and do it well. • Write programs to work together. • Write programs to handle text streams,

Agenda

• Introduction

• A Distributed Pipe

• Tool Application Stack

• Recap

2

Page 3: Containerising Distributed Pipes · Unix Philosophy • Write programs that do one thing and do it well. • Write programs to work together. • Write programs to handle text streams,

Unix Pipe Recap

• […]In Unix-like computer operating systems, a pipeline is a sequence of processes chained together by their standard streams, so that the output of each process (stdout) feeds directly as input (stdin) to the next one.

3

Page 4: Containerising Distributed Pipes · Unix Philosophy • Write programs that do one thing and do it well. • Write programs to work together. • Write programs to handle text streams,

Unix Philosophy

• Write programs that do one thing and do it well.

• Write programs to work together.

• Write programs to handle text streams, because that is a universal interface.

Peter H. Salus says about the unix philosophy

4

Page 5: Containerising Distributed Pipes · Unix Philosophy • Write programs that do one thing and do it well. • Write programs to work together. • Write programs to handle text streams,

Apache Kafka

5

Page 6: Containerising Distributed Pipes · Unix Philosophy • Write programs that do one thing and do it well. • Write programs to work together. • Write programs to handle text streams,

Apache Kafka Cluster Kafka Broker

6

Zookeeper

Page 7: Containerising Distributed Pipes · Unix Philosophy • Write programs that do one thing and do it well. • Write programs to work together. • Write programs to handle text streams,

Apache Kafka Topic

oldernewer

7

topic

Page 8: Containerising Distributed Pipes · Unix Philosophy • Write programs that do one thing and do it well. • Write programs to work together. • Write programs to handle text streams,

Apache Kafka Partitions

8

Page 9: Containerising Distributed Pipes · Unix Philosophy • Write programs that do one thing and do it well. • Write programs to work together. • Write programs to handle text streams,

Apache Kafka Replications

9

Page 10: Containerising Distributed Pipes · Unix Philosophy • Write programs that do one thing and do it well. • Write programs to work together. • Write programs to handle text streams,

Apache Kafka Distribute Partitions

10

Page 11: Containerising Distributed Pipes · Unix Philosophy • Write programs that do one thing and do it well. • Write programs to work together. • Write programs to handle text streams,

Apache Kafka Producer

oldernewer

Producer

11

Page 12: Containerising Distributed Pipes · Unix Philosophy • Write programs that do one thing and do it well. • Write programs to work together. • Write programs to handle text streams,

Apache Kafka Consumer

Consumer

oldernewer

12

Page 13: Containerising Distributed Pipes · Unix Philosophy • Write programs that do one thing and do it well. • Write programs to work together. • Write programs to handle text streams,

Apache Kafka Consumer Groups

oldernewer

Consumer1Consumer1Group1

Consumer1Consumer1Group2

13

Page 14: Containerising Distributed Pipes · Unix Philosophy • Write programs that do one thing and do it well. • Write programs to work together. • Write programs to handle text streams,

Kafka Streaming API

14

Low-Level High-Level

• Topology Builder • Stream and Table abstractions

• Custom Aggregators • Simple Transformation

• Custom Processors • Simple Joins of Tables and Streams

Page 15: Containerising Distributed Pipes · Unix Philosophy • Write programs that do one thing and do it well. • Write programs to work together. • Write programs to handle text streams,

Kafka Streaming API

15

Table Stream (change-log)

alice | 1

alice | 1charlie | 1

alice | 2charlie | 1

(„alice“ , 1)

(„charlie, 1)

(„alice, 2)

Page 16: Containerising Distributed Pipes · Unix Philosophy • Write programs that do one thing and do it well. • Write programs to work together. • Write programs to handle text streams,

Kafka featuring Unix

(split %1 "\s")

16

Page 17: Containerising Distributed Pipes · Unix Philosophy • Write programs that do one thing and do it well. • Write programs to work together. • Write programs to handle text streams,

Kafka featuring Unix

(split %1 "\s")

kafka

Message Broker Stream processing job

17

Page 18: Containerising Distributed Pipes · Unix Philosophy • Write programs that do one thing and do it well. • Write programs to work together. • Write programs to handle text streams,

Kafka featuring Unix

(sketch %1)(split %1 "\s") (store %1)(agg-by-key %1)

18

Page 19: Containerising Distributed Pipes · Unix Philosophy • Write programs that do one thing and do it well. • Write programs to work together. • Write programs to handle text streams,

Containers

• […] Operating-system-level virtualization is a server virtualization method in which the kernel of an operating system allows the existence of multiple isolated user-space instances

19

Page 20: Containerising Distributed Pipes · Unix Philosophy • Write programs that do one thing and do it well. • Write programs to work together. • Write programs to handle text streams,

Containers

• docker run -t -i CONTAINER-NAME …args

• java -jar …args

20

Page 21: Containerising Distributed Pipes · Unix Philosophy • Write programs that do one thing and do it well. • Write programs to work together. • Write programs to handle text streams,

Our Distributed Pipe Application Stack in Docker

ZooKeeper

kafka

Discovering and configuring services in your infrastructure.

Distributed append log a.k.a Message Broker

enables highly reliable distributed coordination.

Provides a distributed full-text search engine

21

Page 22: Containerising Distributed Pipes · Unix Philosophy • Write programs that do one thing and do it well. • Write programs to work together. • Write programs to handle text streams,

consul: image: qnib/alpn-consul networks: - network cpu_shares: 4 mem_limit: 1g environment: - DC_NAME=es ports: - “8500:8500"

22

Page 23: Containerising Distributed Pipes · Unix Philosophy • Write programs that do one thing and do it well. • Write programs to work together. • Write programs to handle text streams,

zookeeper: image: qnib/zookeeper extends: file: base.yml ports: - "2181:2181"

ZooKeeper

23

Page 24: Containerising Distributed Pipes · Unix Philosophy • Write programs that do one thing and do it well. • Write programs to work together. • Write programs to handle text streams,

kafka-broker: image: kafka:0.10.0.1 extends: file: base.yml service: gaikai volumes: - /tmp/kafka-logs ports: - "9092:9092"

kafka

24

Page 25: Containerising Distributed Pipes · Unix Philosophy • Write programs that do one thing and do it well. • Write programs to work together. • Write programs to handle text streams,

elasticsearch: image: elasticsearch:1.7 command: "elasticsearch -Des.cluster.name=dp-es -Dnetwork.bind_host=0.0.0.0" environment: - ES_HEAP_SIZE=2g ports: - "9200:9200" - "9300:9300"

25

Page 26: Containerising Distributed Pipes · Unix Philosophy • Write programs that do one thing and do it well. • Write programs to work together. • Write programs to handle text streams,

26

Page 27: Containerising Distributed Pipes · Unix Philosophy • Write programs that do one thing and do it well. • Write programs to work together. • Write programs to handle text streams,

Stream Processing Recap

27

Power

Simplicity

kafka

Page 28: Containerising Distributed Pipes · Unix Philosophy • Write programs that do one thing and do it well. • Write programs to work together. • Write programs to handle text streams,

28

kafka

kafka streams

Stream Processing Recap

Page 29: Containerising Distributed Pipes · Unix Philosophy • Write programs that do one thing and do it well. • Write programs to work together. • Write programs to handle text streams,

• clj-kstream-cutter

• clj-kstream-hh

• clj-kstream-string-long-window-aggregate

• clj-kstream-elasticsearch-sink

https://github.com/sojoner

Our Distributed Tool Application Stack in Docker

29

Page 30: Containerising Distributed Pipes · Unix Philosophy • Write programs that do one thing and do it well. • Write programs to work together. • Write programs to handle text streams,

https://xkcd.com/297/

30

Page 31: Containerising Distributed Pipes · Unix Philosophy • Write programs that do one thing and do it well. • Write programs to work together. • Write programs to handle text streams,

clj-kstream-cutter

Edmilson Alves 0 Edmilson Alves -LRB- born February 17 , 1976 -RRB- , is a Brazilian midfielder who currently plays for Roasso Kumamoto in the J. League Division 2 .

[ Edmilson, Alves, 0, Edmilson, Alves, LRB, born …]

31

Input:

Output:

Page 32: Containerising Distributed Pipes · Unix Philosophy • Write programs that do one thing and do it well. • Write programs to work together. • Write programs to handle text streams,

[ Edmilson, Alves, 0, Edmilson, Alves, LRB, born, …]

Edmilson ~10 Alves ~8

clj-kstream-hh

32

Input:

Output:

Page 33: Containerising Distributed Pipes · Unix Philosophy • Write programs that do one thing and do it well. • Write programs to work together. • Write programs to handle text streams,

33

Count Min (CM) sketch

Page 34: Containerising Distributed Pipes · Unix Philosophy • Write programs that do one thing and do it well. • Write programs to work together. • Write programs to handle text streams,

34

CM sketch retrieval

Page 35: Containerising Distributed Pipes · Unix Philosophy • Write programs that do one thing and do it well. • Write programs to work together. • Write programs to handle text streams,

And ~10

Bob ~7

Alice ~5

Foo ~3

Bar ~2

take top_n

retrieve sketched valueHeavy Hitter for t_1

35

Heavy Hitter

Page 36: Containerising Distributed Pipes · Unix Philosophy • Write programs that do one thing and do it well. • Write programs to work together. • Write programs to handle text streams,

(defn- heavy-hitter-processor "Main stream processor takes a configuration and a mapper function to apply." [conf] (let [streamBuilder (-> (new TopologyBuilder) (.addSource (:name conf) string_dser string_dser (into-array [(:input-topic conf)])) (.addProcessor "HeavyHitter" (reify ProcessorSupplier (get [this] (get-processor))) (into-array [(:name conf)])) (.addStateStore (->> (Stores/create storeName) (.withStringKeys) (.withLongValues) (.inMemory) (.build)) (into-array ["HeavyHitter"])) (.addSink "Sink" (:output-topic conf) string_ser string_ser (into-array ["HeavyHitter"])))] (.start (KafkaStreams. streamBuilder (get-props conf)))))

36

clj-kstream-hh Topology

Page 37: Containerising Distributed Pipes · Unix Philosophy • Write programs that do one thing and do it well. • Write programs to work together. • Write programs to handle text streams,

(defn ^Processor get-processor [] (reify org.apache.kafka.streams.processor.Processor (init [this context] (.schedule (:context @application-state) (:time-window @application-state)) (swap! hh/state assoc :top-n 5 :number-of-hashfn 10N :bucket-size 1000N) (reset! hh/hitter ^(priority-map)) (reset! hh/min-sketch (make-array Integer/TYPE 10N 1000N)) …) (process [this key value] (debug "Process (k,v)::" key value) (hh/sketch-value value) (hh/add-to-hitter value) …) (punctuate [this timestamp …) (close [this] (.close (:store @application-state)))))

37

clj-kstream-hh Processor

Page 38: Containerising Distributed Pipes · Unix Philosophy • Write programs that do one thing and do it well. • Write programs to work together. • Write programs to handle text streams,

38

clj-kstream-hh Container

clj-kstream-hh: image: sojoner/clj-kstream-hh:0.1.0 hostname: clj-kstream-hh container_name: clj-kstream-hh extends: file: base.yml service: sojoner command: "--broker kafka-broker:9092 --input-topic mapped-test-json --output-topic heavy-hitters --window-size 1 --name stream-hh"

Page 39: Containerising Distributed Pipes · Unix Philosophy • Write programs that do one thing and do it well. • Write programs to work together. • Write programs to handle text streams,

Input:

Output:

Alves ~8 Alves ~10 Edmilson ~5 Edmilson~3

Alves ~18 Edmilson ~8

(key, value)

39

clj-kstream-string-long-window-aggregate

Page 40: Containerising Distributed Pipes · Unix Philosophy • Write programs that do one thing and do it well. • Write programs to work together. • Write programs to handle text streams,

clj-kstream-elasticsearch-sink

Input:

Output:

{ “name“: “Alves“, “count“: 18, “time“: “January 26th 2017, 17:03:00.000” }

Alves ~18

40

Page 41: Containerising Distributed Pipes · Unix Philosophy • Write programs that do one thing and do it well. • Write programs to work together. • Write programs to handle text streams,

Our Distributed Tool Application Stack in Docker

Edmilson Alves 0 Edmilson Alves -LRB- born February 17 , 1976 -RRB- , is a Brazilian midfielder who currently plays for Roasso Kumamoto in the J. League Division 2 .

[ Edmilson, Alves, 0, Edmilson, Alves, LRB, born …]

Alves ~10 Alves ~8

{“name“: “Alves“, “count“: 18, “time“: “January 26th 2017, 17:03:00.000”}

Alves 18

41

Page 42: Containerising Distributed Pipes · Unix Philosophy • Write programs that do one thing and do it well. • Write programs to work together. • Write programs to handle text streams,

Our Distributed Tool Application Stack in Docker

42

Page 43: Containerising Distributed Pipes · Unix Philosophy • Write programs that do one thing and do it well. • Write programs to work together. • Write programs to handle text streams,

From the development setup…

$ export DOCKER_HOST=tcp://my.desktop.de:257643

Page 44: Containerising Distributed Pipes · Unix Philosophy • Write programs that do one thing and do it well. • Write programs to work together. • Write programs to handle text streams,

…to a datacenter setup

$ export DOCKER_HOST=tcp://my.datacenter.de:2576

Build a Docker Swarm

44

Page 45: Containerising Distributed Pipes · Unix Philosophy • Write programs that do one thing and do it well. • Write programs to work together. • Write programs to handle text streams,

Current Challenges• Kafka Streams still at least once

• Persistent and durable Storage

• Still need capacity planning

• Testing / Debugging is still a challenge

• Consistency of the state storage

• Processing Time vs. Event Time

• How about Amdahl’s law

45

Page 46: Containerising Distributed Pipes · Unix Philosophy • Write programs that do one thing and do it well. • Write programs to work together. • Write programs to handle text streams,

Recap(split %1 "\s")

46

as Message broker as Stream Processor

(split %1 "\s")

Page 47: Containerising Distributed Pipes · Unix Philosophy • Write programs that do one thing and do it well. • Write programs to work together. • Write programs to handle text streams,

Containerising Distributed Pipes

a story about convenience

(thanks (listening [this]))

47

Page 48: Containerising Distributed Pipes · Unix Philosophy • Write programs that do one thing and do it well. • Write programs to work together. • Write programs to handle text streams,

Sources 1• http://kafka.apache.org/

• https://martin.kleppmann.com/2015/05/06/data-agility-at-strata.html

• https://speakerdeck.com/ept/kafka-and-samza-distributed-stream-processing-in-practice

• https://github.com/mhausenblas/dnpipes

• https://en.wikipedia.org/wiki/Pipeline_%28Unix%29

• https://zookeeper.apache.org/doc/trunk/zookeeperOver.html

• https://github.com/sojoner/container-stacks/tree/master/kafkaelasticsearch

48

Page 49: Containerising Distributed Pipes · Unix Philosophy • Write programs that do one thing and do it well. • Write programs to work together. • Write programs to handle text streams,

Sources 2• https://kafka.apache.org/documentation/streams#streams_processor

• https://kafka.apache.org/documentation/streams#streams_dsl

• https://hub.docker.com/r/sojoner/clj-kstream-elasticsearch-sink/

• https://hub.docker.com/r/sojoner/clj-kstream-cutter/

• https://hub.docker.com/r/sojoner/clj-kstream-hh/

• https://hub.docker.com/r/sojoner/clj-kstream-string-long-window-aggregate/

• https://blog.acolyer.org/2016/07/21/time-adaptive-sketches-ada-sketches-for-summarizing-data-streams/

49