27
Real Time Analytics with Vagmi Mudumbai @vagmi / @reducedata

Real Time Analytics with Cassandra

Embed Size (px)

DESCRIPTION

A recipe of Acunu style analytics with Cassandra

Citation preview

Page 1: Real Time Analytics with Cassandra

Real Time Analytics with

Vagmi Mudumbai@vagmi / @reducedata

Page 2: Real Time Analytics with Cassandra

What is Cassandra?

Page 3: Real Time Analytics with Cassandra

DynamoBased on

Page 4: Real Time Analytics with Cassandra

FacebookBuilt by

Page 5: Real Time Analytics with Cassandra

Key Value Storeis both a

Page 6: Real Time Analytics with Cassandra

Column Storeand a

Page 7: Real Time Analytics with Cassandra

The CAP Theorem

Page 8: Real Time Analytics with Cassandra

Column Families

Page 9: Real Time Analytics with Cassandra

HashMap<RowKey,SortedMap<ColumnName, Value>>

Page 10: Real Time Analytics with Cassandra

id name email country

1 Vagmi [email protected] IN

2 Karthik yeskarthik@blah IN

3 MarkZ mark@fb US

Rowkey 1 2 3

name Vagmi Karthik MarkZ

email [email protected] yeskarthik@blah mark@fb

country IN IN US

Page 11: Real Time Analytics with Cassandra

The Problem

Page 12: Real Time Analytics with Cassandra

As a user, I want to view real time metrics and filter by dimensions like time, city,

category, etc.

Page 13: Real Time Analytics with Cassandra

select sum(measure) from events where time between A and B and country=’US’ and

device_platform=’Android’

The wrong way

Page 14: Real Time Analytics with Cassandra

HashMap<RowKey,SortedMap<ColumnName, Value>>

Page 15: Real Time Analytics with Cassandra

Counters

Page 16: Real Time Analytics with Cassandra

create column family view_counts_hourly with comparator=UTF8Type and default_validation_class=CounterColumnType and key_validation_class=UTF8Type;

Page 17: Real Time Analytics with Cassandra

http://reducedata.com/, Chrome, 2014-03-14 15:30:00Z,

IP, Cookie-Info

Page 18: Real Time Analytics with Cassandra

RowKey 20140101 20140102 20140103 20140104 ... ... 20140628 ... 20150308

sid1#us 2553 2341 2342 3242 ... ... 32342 ... 33423

sid1#us#chrome 1556 1532 1892 ... ... ... ... ... ...

sid1#us#chrome#25 833 899 1200

Page 19: Real Time Analytics with Cassandra

Uniques?but what about

Page 20: Real Time Analytics with Cassandra

Bitmaps to the rescue

Page 21: Real Time Analytics with Cassandra

1 0 1 0 1 1 0 0 0 1 1 0 1 0 0 1

u1 u2 u3 u4 u5 u6 u7 u8 u9 u10 u11 u12 u13 ... ... ...

Page 22: Real Time Analytics with Cassandra

UID- 1328abc2838fd283e282

Fast Hash Function - Murmur32

1 0 1 0 1 1 0 0 0 1 1 0 1 0 0 1

u1 u2 u3 u4 u5 u6 u7 u8 u9 u10 u11 u12 u13 ... ... ...

Page 23: Real Time Analytics with Cassandra

RowKey 20140101 20140102 20140103 20140104 ... ... 20140628 ... 20150308

sid1#us 10101 10111 11100 11101 ... ... ... ... 11101

sid1#us#chrome ... ... ... ... ... ... ... ... ...

sid1#us#chrome#25 10101 11101 11100 …. ... ... ... ... ...

Page 24: Real Time Analytics with Cassandra

But I do not have Big Data

Page 25: Real Time Analytics with Cassandra
Page 26: Real Time Analytics with Cassandra

Oh andwe’re hiring

([email protected])

Page 27: Real Time Analytics with Cassandra

Thanks@vagmi on

Github / Twitter / Facebook