28
Presented by Patrick Di Loreto R&D Engineering Lead 14 th June 2015 Site: https://developer.williamhill.com/ BLOG: http://patricknoir.blogspot.com Twitter: https://twitter.com/patricknoir Using Spark, Kafka, Cassandra and Akka on Mesos for Real-Time Personalization

Using Spark, Kafka, Cassandra and Akka on Mesos for Real-Time Personalization

Embed Size (px)

Citation preview

Presented by Patrick Di Loreto R&D Engineering Lead 14th June 2015 Site: https://developer.williamhill.com/ BLOG: http://patricknoir.blogspot.com Twitter: https://twitter.com/patricknoir

Using Spark, Kafka, Cassandra and Akka on Mesos for Real-Time Personalization

•  WH Labs •  Omnia – Data Management Platform

–  Omnia Chronos – A distributed Integration Middleware with Akka and Kafka –  Omnia Fates – The long term memory with Apache Cassandra –  Omnia NeoCortex – Real time and Machine Learning using Apache Spark –  Omnia Hermes – Serving layer with Akka CQRS –  Omnia Infrastructure - Mesos, Marathon and Docker

Introduction

We're  Hiring  h+ps://careers.williamhill.com  

WH  Apple  Watch  App   Interac:ve  Scoreboard   Virtual  Reality  Horse  Race  Oculus  RiD  

Omnia Platform Reactive Distributed Data Platform

Based on a Lambda Architecture Respecting Reactive Principles

•  Chronos – Data Source •  Fates – Batch Layer •  NeoCortex – Speed Layer •  Hermes – Serving Layer

Omnia – Data Management Platform

Omnia  

Chronos  

Fates  

Hermes  

NeoCortex  

Omnia & Lambda Architecture

Chrono

s  (Data  Source)  

NeoCortex  (Speed  Layer)  

Fates  (Batch  Layer)  

Hermes  

(Serving  Layer)            

         

Omnia Principles

h+p://www.reac:vemanifesto.org/  

•  Scalable •  Fault Tolerant •  Highly Available

Omnia Chronos – Data Source

Omnia Chronos

Is in charge to collect the data from different sources and organise them into a stream of observable events.

Observable [ ]

• Social  media  • Facebook  • Twi+er  • Affiliates  

• Page  viewing  • Ar:cles  read,  following  and  followers,  bets  etc…  

• Sports  related  • Tweets  • News  • Gaming  

• Web  Analy:cs  • Ac:vi:es  with  in  our  applica:ons  

Internal   Product  Centric  

External  Customer  Centric  

{      “type”  :  “bet”,      “version”  :  “1.0”      “Ame”  :  “2015-­‐06-­‐03  08:00:31”,      “acquisiAonTime:  “  .  .  .”,      “source”  :  “WHBetSystem”      “payload”  :  {  …  any  valid  json  }  }  

Omnia Chronos In Chronos you define streams that collect data and convert/persist into a stream of Observable[Incident].

Chronos  

Stream  3  

Stream  2  

Stream  1  

Stream  

Omnia Chronos - Clustering

Chronos  1   Chronos  2   Chronos  3  

Twi+er    

Omnia Chronos

•  Each stream is an actor which supervises its children: –  Adapter Actor –  Converter Actor –  Persistence Manager Actor

•  Streams Actor are referential transparent with the usage of Akka Cluster: We have extended Akka Cluster to migrate the Stream Actors based on resource KPIs

•  Data are persisted in Kafka for durability •  Chronos is built on top of Akka, ScalaRx and Play framework:

planning migration to Akka Streaming

Omnia Fates

Fates represents the long term memory of Omnia. Is in charge to organise all the incidents recorded by Chronos into timelines and create new information as views by using machine learning, logical reasoning and time series analysis. •  A timeline represents the history, the sequence of incidents performed by a specific entity over the time. Timelines

are organised per categories. An example of timeline can be the customer timeline, which might contain all the bets placed, deposit and withdraw activities, tweets etc... performed by the specific customer. A timeline category is not limited just to customers, it can be anything, for example: Sport Event: football match, competition

•  Views are the result of job task that elaborates data from: –  Timelines –  Other Views

Omnia Fates

Timelines are created from timeline streams, each timeline stream read data from a Chronos stream and fed the right timeline.

Omnia Fates Ch

rono

s              

Fates  

               

•  Fates persist timelines of incidents. •  Column Family Name: <TimelineCategory>_tl •  Key Definition: ( (entityId, date), timestamp )

•  The partition key is a strong hash key : well balanced Cassandra Cluster •  Composite key: incidents are ordered by timestamp under a specific entity within a day

(date = yyyy-MM-dd )

Omnia Fates - Cassandra

Omnia Fates

•  We build views with job able to do: Jobs are performed on top of NeoCortex

Logical  Reasoning  • Deduc:on  •  Induc:on  • Abduc:on  

Time  line  analysis  • Trends  • Cycles  • Seasonality  

Other  ML  • Classifica:on  • Clustering  • Predic:ons  

Omnia Neo Cortex

Omnia Neo Cortex •  Neo Cortex is a library developed on top of Apache Spark in order to provide to the

developers an easy way to write micro services on top of Omnia. •  In NeoCortex we use the distribute nature of Spark to perform fast, real time data

processing and we hide to the developer the problematic relative to the connection to the source system (Chronos) and the publishing layer

•  Typeclass definition for: Timeline, View, ChronosStream etc… •  Typeclass definition for Algebrical structures:

–  Monoids, Rings, Groups, providing advanced functions for: moving averages, ARX, ARMA etc

Omnia Neo Cortex

Omnia Neo Cortex - Parallelism

chronos  stream  

Driver  

Executor  1  

Executor  2  

Executor  3  

Executor  4  

Executor  3  

Executor  4  

Hermes  

(Serving  Layer)            

Stage  1  (map)  

Stage  2  (reduceByKey)  

Fates  :melines  views  

Omnia Hermes

Hermes Is the layer on which data get represented for consumption: B2B and B2C. At its foundation micro-services, notifications and data as API are key aspects of the design

Scalable and simple full duplex communication for the web Express the correlation between the entities of the model Inspired by Falcor (Netflix) and GraphQL (Facebook)

Hermes

Hermes  Distributed  Cache  

Hermes  Node  

                 

Local  Cache

 

Subscrip:o

n  Manager  

Client  M

anager  

Authen

:ca:

on  Handler  

Dispatcher  

HTTP  

WS  

TCP  

Browser  

Herm

es  JS  

WH  Ap

ps  

Omnia Infrastructure – Mesos/Marathon/Docker

Omnia Infrastructure

Omnia  

Docker  

Marathon    

Mesos  

Node   Node   Node   Node   Node  

Use Omnia on Omnia

Mesos  

Maratho

n  

Docker  (Applica:on  Repository)  

                     

       

Docker  

Omnia  App  

       

Docker  

Omnia  App  

       

Docker  

Omnia  App  Ch

rono

s    

NeoCortex  (Speed  Layer)  

Fates  (Batch  Layer)  

         JMX   JMX  

JMX  

Health  Stream  

Thank you

Q&A