Architecture for Real-Time and Batch Big-Data Analytics

Architecture for Real-Time and Batch Big Data Analytics

Embed Size (px)

Citation preview

Architecture for Real-Time and Batch Big-Data Analytics

The World's leading Mobile Measurement Platform

Founded in 2011 by Reshef Mann and Oren Kaniel

Just completed Round B funding - total of $28M

Processing 3.5B daily events (it was 1.9B just 2 months ago and 250M at the start of 2014!)

13 people in the development team (we were just 6 people 12 months ago!)

AppsFlyer Who?!

The Way W e W ere

• We had no real concept of “Big Data”- we were just occupied with making the system work

• Even though we were ignorant of the future, we tried to adhere to a few abstractions:

- Small isolated services

- Central concept of message delivery via

- Different tech for different tasks

A message-bus

• CouchDB that served raw reports via views couldn't keep up with view generation

• Python processes that read from the message bus (via pub sub) couldn't keep up with the amount of data

• The split between aggregated and raw data was good, but caused discrepancies because each service failed at a different time

• If the message bus failed (Redis), all other services were also in a fail state – single point of failure

First Creaks In The System

Part 1 of The Solution

• Migrate raw reports from CouchDB to Google's Bigquery (easiest solution at the time)

• Rewrite some of the Python services in a new language that:

– Deals better with strings and allocation of memory

– Can help us scale out

– Has a great ecosystem

– Functional

WHY FUNCTIONAL?!

Why Functional?

Why Clojure?

Sequence based processing capabilities really fit in the visualized data flow of AppsFlyer (processing the Event Stream)

Enforces use of FP paradigm more strictly than Scala

Repl based development

Easy and common Java interop

JVM!

• Python's proprietary serialization

• Python custom data structure as the base message in the system

The Hurdles(or how 2 stupid mistakes can bite

you in the ass 2 years down the road)

Part 2 of The Solution

The Event Stream

Small isolated services

Each service has a single business responsibility

Each service encapsulates its own data (if it has any) and he exposes it over a well known interface

Data objects are always POCO/POJO (simple data structures represented in JSON or EDN)

Preference for queues and buffers that pass isolated data for total async processing

How We Model

How W e Test

No QA team

Each new service can read from the event stream to its heart content - regular Kafka consumers behavior

Each new service handles real life traffic and real life load because it's connected to the event stream

Test DB if needed is easy to spin up on the cloud

Once deemed ready, just throw the switch to on

Service discovery via Consul

How W e Orchestrate

MesosEach service is a single uberjar in a single docker container

DBs get their own dedicated machines

Preference for ring based architecture for DBs

Marathon

Jenkins build server

Deployment via the in-house Santa tool

Consul health checks

Statsd for the JVM and application metrics

Sensu

End-to-end flows

Deployment and

Monitoring

The Bird's Eye View