Real-time and long-time together

1©MapR Technologies - Confidential

Real-time and Long-time with Storm and Hadoop


Real-time and Long-time with Storm and Hadoop MapR


Contact:– [email protected]– @ted_dunning

Slides and such (available late tonight):– http://info.mapr.com/ted-utahjug

Hash tags: #mapr #storm

mailto:[email protected]


The Challenge

Hadoop is great of processing vats of data– But sucks for real-time (by design!)

Storm is great for real-time processing– But lacks any way to deal with batch processing

It sounds like there isn’t a solution– Neither fashionable solution handles everything


This is not a problem.

It’s an opportunity!


What is Map-Reduce?

Map-reduce programs are defined (mostly) by– A map function that does independent record transformations (and

deletions and replications)– Reduce functions that do aggregation

Map-reduce programs run in framework that– Schedules and re-runs tasks– Splits the input– Moves map outputs to reduce inputs– Receives the results

6


Inside Map-Reduce

7

Input Map CombineShuffleand sort

Reduce Output

Reduce

"The time has come," the Walrus said,"To talk of many things:Of shoes—and ships—and sealing-wax

the, 1time, 1has, 1come, 1…

come, [3,2,1]has, [1,5,2]the, [1,2,1]time, [10,1,3]…

come, 6has, 8the, 4time, 14…


Not Just Text

Counting words is easy

Many other problems work as well– Sessionize user logs– Very large scale joins– Large scale matrix recommendations– Computing the quadrillionth digit of π

Map-reduce is inherently batch oriented


Inside Map-Reduce

9

Input Map Shuffleand sort

Reduce Output

road1, polyline(p1, p2, p3, …)lake1, polygon(p4, p5, p7, p9, …)road2, polyline(p6, p7, p9, …)

tile0918-1412, road1tile1082-8143, road1tile0014-3284, lake1tile1082-8143, lake1…

tile0918-1412, [road1]tile1082-8143, [road1, lake1]tile0014-3284, [lake1]…

tile1082-8143, img#1tile0014-3284, img#2tile1082-8143, img#3…


What is Storm?

A Storm program is called a topology– Spouts inject data into a topology– Bolts process data

The units of data are called tuples All processing is flow-through Bolts can buffer or persist Output tuples can be anchored If a tuple and all descendants are not ack’ed in time, bolt has died Bolts that fail are restarted and un-acked tuples are replayed


t

now

Hadoop is Not Very Real-time

UnprocessedData

Fully processed

Latest full period

Hadoop job takes this long for this data


t

now

Hadoop works great back here

Storm workshere

Real-time and Long-time together

Blended view

Blended view

Blended View


One Alternative

Search Engine

NoSqlde Jour

Consumer

Real-time Long-time

?


Problems

Simply dumping into noSql engine doesn’t quite work Insert rate is limited No load isolation– Big retrospective jobs kill real-time

Low scan performance– Hbase pretty good, but not stellar

Difficult to set boundaries– where does real-time end and long-time begin?


Data Sources

Catcher Cluster

Rough Design – Data Flow

Catcher Cluster

Query Event Spout

Logger Bolt

Counter Bolt

Raw Logs

LoggerBolt

Semi Agg

Hadoop Aggregator

Snap

Long agg

ProtoSpout Counter Bolt

Logger Bolt

Data Sources


Closer Look – Catcher Protocol

Data Sources

Catcher ClusterCatcher Cluster

Data Sources

The data sources and catchers communicate with a very simple protocol.

Hello() => list of catchersLog(topic,message) => (OK|FAIL, redirect-to-catcher)


Closer Look – Catcher Queues

Catcher Cluster

Catcher Cluster

The catchers forward log requests to the correct catcher and return that host in the reply to allow the client to avoid the extra hop.

Each topic file is appended by exactly one catcher.

Topic files are kept in shared file storage.

TopicFile

TopicFile


Closer Look – ProtoSpout

The ProtoSpout tails the topic files,parses log records into tuples andinjects them into the Storm topology.

Last fully acked position stored in shared file system.

TopicFile

TopicFile

ProtoSpout


Closer Look – Counter Bolt

Critical design goals:– fast ack for all tuples– fast restart of counter

Ack happens when tuple hits the replay log (10’s of milliseconds) Restart involves replaying semi-agg’s + replay log (very fast) Replay log only lasts until next semi-aggregate goes out

Counter Bolt

Replay Log

Semi-aggregated records

Incoming records

Real-time Long-time


A Frozen Moment in Time

Snapshot defines the dividing line

All data in the snap is long-time, all after is real-time

Semi-agg strategy allows clean query

Semi Agg

Hadoop Aggregator

Snap

Long agg


Guarantees

Counter output volume is small-ish– the greater of k tuples per 100K inputs or k tuple/s– 1 tuple/s/label/bolt for this exercise

Persistence layer must provide guarantees– distributed against node failure– must have either readable flush or closed-append

HDFS is distributed, but provides no guarantees and strange semantics

MapRfs is distributed, provides all necessary guarantees


Presentation Layer

Presentation must– read recent output of Logger bolt– read relevant output of Hadoop jobs– combine semi-aggregated records

User will see– counts that increment within 0-2 s of events– seamless and accurate meld of short and long-term data


Example 2 – AB testing in real-time

I have 15 versions of my landing page Each visitor is assigned to a version– Which version?

A conversion or sale or whatever can happen– How long to wait?

Some versions of the landing page are horrible– Don’t want to give them traffic


Bayesian Bandit

Compute distributions based on data Sample p1 and p2 from these distributions

Put a coin in bandit 1 if p1 > p2

Else, put the coin in bandit 2


And it works!


Video Demo


The Code

Select an alternative

Select and learn

But we already know how to count!

n = dim(k)[1] p0 = rep(0, length.out=n) for (i in 1:n) { p0[i] = rbeta(1, k[i,2]+1, k[i,1]+1) } return (which(p0 == max(p0)))

for (z in 1:steps) { i = select(k) j = test(i) k[i,j] = k[i,j]+1 } return (k)


The Basic Idea

We can encode a distribution by sampling Sampling allows unification of exploration and exploitation

Can be extended to more general response models


The Bigger Basic Idea

Online algorithms generally have relatively small state (like counting)

Online algorithms generally have a simple update (like counting) If we can do this with counting, we can do it with all kinds of

algorithms


Contact:– [email protected]– @ted_dunning

Slides and such (available late tonight):– Send email !!

Hash tags: #mapr #bigdata

We are hiring!

mailto:[email protected]


Thank You

Technology

Real-time and long-time together