Upload
jonathan-winandy
View
159
Download
0
Embed Size (px)
Citation preview
About me- My name is Jonathan Winandy (@ahoy_jon).
- I am a Data pipeline engineer :
- I worked on a “DataLake” !
- I use tools in the larger Java ecosystem like Java, Scala, Clojure, Hadoop …
- And I am an “entrepreneur”.
> Introduction
I cofounded
We do health care oriented software engineering.
We provide : - Coordination for health care professionals. - “Big health care Data” pipelines.
> Introduction
What is a Stream ? It’s an abstract data structure with the following :
operations : • append(bytes) -> void? • readAt(int) -> null | bytes
rule 1 : ∀p ∈ ℕ, for some definition of ‘==‘ x := readAt(p) y := readAt(p)
x != null => x == y
Rule 1 implies : Infinite cacheability once the data is available at a position.
> Theory
Streams are the simplest way to manage data.
And they are naturally compatible with the perception of information from a singular observer …
0 1 2 3 4 5 6
> Theory
https://www.youtube.com/watch?v=ggCffvKEJmQ
Peter Alvaro - Outwards from the Middle of the Maze
> Dist systems
> Summary
The need of unified log arises ‘quickly’ in apps that manage state (or multiple states) when they need to do :
- Business Intelligence, - Notifications, - Advanced search (secondary indexation), - ….
But there is a lot of legacy in projets and practices, this technique has been regularly “forgotten*”.
Broker 1
Broker 2
Broker3
ZK
Producer
1. Hello ZK, do you know where I can find
some brokers ?
2. Ahoy ?
3. Want some data ?
> Producer
Message acking for producer (“write concern”)
0 : Here a messa’
1 : If at least you, leader of this partition, received it and saved it, I am ok.
-1: Hey, I just send you a message, I know it’s maybe to much to ask, But are you really sure you saved it ? Ok, and did all brokers in the “In Sync Replicas” for this partition did too ? I now I am … but this information is really important for our $$$$.
Speed
Durability
> Producer
https://github.com/bulldog2011/luxun
Kafka as unique properties, PLEASE : don’t try to use something else instead.
We should talk about CAP
But CAP is about mutation
And consistency is a complicated subject
And consistency is a complicated subject
A quick note on CausalityIf you don’t ensure causality for web apps, some strange comportements may arise :
Sometimes, as a user, I cannot see my own “edits”.
Sometimes, as a client, I cannot buy on the website after I checkout my basket.
APP APP
“Who is the fastest between the Data bus and the client ?”You don’t want to bet, especially under load.
Bonus :What is a CAS ?A Content Adressable Storage is a specific “key value store” :
operations : - store(bytes) -> key - get(key) -> null | bytes
rule 1 : key = h(data) h being a cryptographic hash function like md5 or sha1.
rule 2 : ∀data get(store(data)) = data
Rule 1 and 2 imply : Infinite cacheability and scalability.