33
A data layer in Clojure @sbelak [email protected]

A data layer in clojure

Embed Size (px)

Citation preview

Page 1: A data layer in clojure

A data layer in Clojure

@sbelak [email protected]

Page 2: A data layer in clojure

• Started in machine learning • Turned to data science and

helped 20+ companies become data-driven

• Now leading data science department at GoOpti

Page 3: A data layer in clojure

Self-service infrastructure for data scientists

Page 4: A data layer in clojure

The analytics chasmIdeal. Almost real-time, can be done during brainstorming without disrupting flow

< 2min < 20min project

squeeze in somewhere in the day

fail

roadmapahoy!

Page 5: A data layer in clojure

My goto architecture

KafkaDB EventsOnyx Onyx

Onyx

Persist all events to S3 • time travel • query with AWS Athena

Page 6: A data layer in clojure

Onyxa masterless, cloud scale, fault tolerant, high performance distributed computation system

… written entirely in Clojure

Page 7: A data layer in clojure

Clojure at a glance• Lisp running on JVM

• Functional, dynamic, immutable

• Excellent concurrency and state management support

• Unparalleled data manipulation

• Good Java interoperability

Page 8: A data layer in clojure

Onyx at• In production for almost a year

• ETL

• online machine learning

• offline (batch) machine learning

• ad-hoc analysis

Page 9: A data layer in clojure

Onyx at a glance

Page 10: A data layer in clojure

Job =

[[:input :processing-1] [:input :processing-2] [:processing-1 :output-1] [:processing-2 :output-2]]

[{:flow/from :input-stream :flow/to [:process-adults] :flow/predicate :my.ns/adult? :flow/doc "Emits segment if an adult.”}]

workflow + flow conditions + catalogue [{:onyx/name :add-5

:onyx/fn :my/adder :onyx/type :function :my/n 5 :onyx/params [:my/n]}

{:onyx/name :in :onyx/plugin :onyx.plugin.core-async/input :onyx/type :input :onyx/medium :core.async :onyx/batch-size batch-size :onyx/max-peers 1 :onyx/doc "Reads segments from a core.async channel"}

{:onyx/name :out :onyx/plugin :onyx.plugin.core-async/output :onyx/type :output :onyx/medium :core.async :onyx/doc "Writes segments to a core.async channel"}]

Page 11: A data layer in clojure

Catalogue[{:onyx/name :add-5 :onyx/fn :my/adder :onyx/type :function :my/n 5 :onyx/params [:my/n]}

{:onyx/name :in :onyx/plugin :onyx.plugin.core-async/input :onyx/type :input :onyx/medium :core.async :onyx/batch-size batch-size :onyx/max-peers 1 :onyx/doc "Reads segments from a core.async channel"}

{:onyx/name :out :onyx/plugin :onyx.plugin.core-async/output :onyx/type :output :onyx/medium :core.async :onyx/doc "Writes segments to a core.async channel"}]

Vanilla Clojure function(defn adder [n {:keys [x] :as segment}] (assoc segment :x (+ n x))))

Plugins (I/O)seq, async, Kafka, Datomic, SQL, S3, SQS, …

parameter

self-documenting

Page 12: A data layer in clojure

Computation entirely described with data

data is

code!

Page 13: A data layer in clojure

Everything can be run locally!

Page 14: A data layer in clojure

Testing without mocking

Page 15: A data layer in clojure

Resilience and handling state

• Activity log

• Window and trigger states checkpointed

• Resume points

• Configurable flux policies

Page 16: A data layer in clojure

How Onyx rewired my brain

Page 17: A data layer in clojure

It’s not about scaling, but clean architecture

Page 18: A data layer in clojure

Decomplect everything

Page 19: A data layer in clojure

Computation graphs

Page 20: A data layer in clojure

Machine learning with Onyx

• Hyperparameter server build on top of Onyx parameters

• Batch & streaming mode

• Monitoring: performance metrics, side channels for partial results/introspection into computiation

• Everything is data so easy to build tools around

Page 21: A data layer in clojure

Onyx/Pyroclast

Page 22: A data layer in clojure

Putting “data is code” to work

Page 23: A data layer in clojure

Describing data with clojure.spec

composing smaller parts into the whole }

code i

s data

!

Page 24: A data layer in clojure

Queryable data descriptions

Turn spec into a graph

A fully interactive and open type system!

order

promo code

useraccount age

countryalways always

alwaysmaybe

Page 25: A data layer in clojure

“Composition is about decomposing.”

— E. Normand

Page 26: A data layer in clojure

Case study: autogenerating materialised views

KafkaMaterialised views

Events External data

Automatic view generation• Event & attribute ontology

• Manual (via spec) • Inferred

• Statistical analysis (seasonality detection, outlier removal, …)

Onyx Onyx

Onyx

Page 27: A data layer in clojure

Automatic view generation

1. Walk spec registry

2. Apply rules

1. Define new view (spec)

2. Trigger Onyx job that creates the view

Page 28: A data layer in clojure

Takeouts

Page 29: A data layer in clojure

Everything should be live and interactive

Page 30: A data layer in clojure

Computation graphs are a great way to structure data processing code

Page 31: A data layer in clojure

Queryable data and computation descriptions supercharge interactive development and are a great building block for automation