54
javier ramirez @supercoco9 API Analytics with Redis and Google Bigquery

API analytics with Redis and Google Bigquery. NoSQL matters edition

Embed Size (px)

DESCRIPTION

At teowaki we have a system for API use analytics using Redis as a fast intermediate store and bigquery as a big data backend. As a result, we can launch aggregated queries on our traffic/usage data in a few seconds and we can try and find for usage patterns that wouldn’t be obvious otherwise. In this session I will speak of the alternatives we evaluated and how we are using Redis and Bigquery to solve our problem.

Citation preview

Page 1: API analytics with Redis and Google Bigquery. NoSQL matters edition

javier ramirez@supercoco9

API Analytics with Redis

and Google Bigquery

Page 2: API analytics with Redis and Google Bigquery. NoSQL matters edition
Page 3: API analytics with Redis and Google Bigquery. NoSQL matters edition
Page 4: API analytics with Redis and Google Bigquery. NoSQL matters edition
Page 5: API analytics with Redis and Google Bigquery. NoSQL matters edition

javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013

REST API +

AngularJS web as an API client

Page 6: API analytics with Redis and Google Bigquery. NoSQL matters edition
Page 7: API analytics with Redis and Google Bigquery. NoSQL matters edition
Page 8: API analytics with Redis and Google Bigquery. NoSQL matters edition
Page 9: API analytics with Redis and Google Bigquery. NoSQL matters edition

obvious solution:

use a ready-made service as 3scale or apigee

javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013

Page 10: API analytics with Redis and Google Bigquery. NoSQL matters edition

1. non intrusive metrics2. keep the history3. avoid vendor lock-in4. interactive queries5. cheap6. extra ball: real time

javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013

Page 11: API analytics with Redis and Google Bigquery. NoSQL matters edition

javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013

Page 12: API analytics with Redis and Google Bigquery. NoSQL matters edition

data that’s an order of magnitude greater than data you’re accustomed to

javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013

Doug Laney VP Research, Business Analytics and Performance Management at Gartner

Page 13: API analytics with Redis and Google Bigquery. NoSQL matters edition

data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the structures of your database architectures.

Ed Dumbill program chair for the O’Reilly Strata Conference

javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013

Page 14: API analytics with Redis and Google Bigquery. NoSQL matters edition

bigdata is doing a fullscan to 330MM rows, matching them against a regexp, and getting the result (223MM rows) in just 5 seconds

javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013

Javier Ramirezimpresionable teowaki founder

Page 15: API analytics with Redis and Google Bigquery. NoSQL matters edition

javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013

Page 16: API analytics with Redis and Google Bigquery. NoSQL matters edition

Intel(R) Xeon(R) CPU E5520 @ 2.27GHz (with pipelining)

$ ./redis-benchmark -r 1000000 -n 2000000 -t get,set,lpush,lpop -P 16 -q

SET: 552,028 requests per secondGET: 707,463 requests per secondLPUSH: 767,459 requests per secondLPOP: 770,119 requests per second

Intel(R) Xeon(R) CPU E5520 @ 2.27GHz (without pipelining)$ ./redis-benchmark -r 1000000 -n 2000000 -t get,set,lpush,lpop -q

SET: 122,556 requests per secondGET: 123,601 requests per secondLPUSH: 136,752 requests per secondLPOP: 132,424 requests per second

javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013

Page 17: API analytics with Redis and Google Bigquery. NoSQL matters edition

open source, BSD licensed, advanced key-value store. It is often referred to as a data structure server since keys can contain strings, hashes, lists, sets and sorted sets.

http://redis.io

started in 2009 by Salvatore Sanfilippo @antirez

100 contributors at https://github.com/antirez/redis

javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013

Page 18: API analytics with Redis and Google Bigquery. NoSQL matters edition

what is it used for

javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013

Page 19: API analytics with Redis and Google Bigquery. NoSQL matters edition

twitter

Every time line (800 tweets per user) is stored in redis

5000 writes per second avg300K reads per second

javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013

Page 20: API analytics with Redis and Google Bigquery. NoSQL matters edition

nginx + lua + redis

apache + mruby + redis

javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013

Page 21: API analytics with Redis and Google Bigquery. NoSQL matters edition

javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013

Page 22: API analytics with Redis and Google Bigquery. NoSQL matters edition

Redis keeps

everything in memory all the time

javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013

Page 23: API analytics with Redis and Google Bigquery. NoSQL matters edition

javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013

Page 24: API analytics with Redis and Google Bigquery. NoSQL matters edition

easy: store GZIPPED files into S3/Glacier

javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013

* we are moving to google cloud now

Page 25: API analytics with Redis and Google Bigquery. NoSQL matters edition

javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013

Page 26: API analytics with Redis and Google Bigquery. NoSQL matters edition

Hadoop (map/reduce)

javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013

http://hadoop.apache.org/

started in 2005 by Doug Cutting and Mike Cafarella

Page 27: API analytics with Redis and Google Bigquery. NoSQL matters edition

cassandra

javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013

http://cassandra.apache.org/

released in 2008 by facebook.

Page 28: API analytics with Redis and Google Bigquery. NoSQL matters edition

other big data solutions:

hadoop+voldemort+kafka

hbase

javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013

http://engineering.linkedin.com/projects

http://hbase.apache.org/

Page 29: API analytics with Redis and Google Bigquery. NoSQL matters edition

Amazon Redshift

javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013

http://aws.amazon.com/redshift/

Page 30: API analytics with Redis and Google Bigquery. NoSQL matters edition

Our choice:

google bigquery

Data analysis as a service

http://developers.google.com/bigquery

javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013

Page 31: API analytics with Redis and Google Bigquery. NoSQL matters edition

Based on Dremel

Specifically designed for interactive queries over petabytes of real-time data

javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013

Page 32: API analytics with Redis and Google Bigquery. NoSQL matters edition

Columnar storage

Easy to compress

Convenient for querying long series over a single column

javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013

Page 33: API analytics with Redis and Google Bigquery. NoSQL matters edition

loading data

You can feed flat CSV files or nested JSON objects

javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013

Page 34: API analytics with Redis and Google Bigquery. NoSQL matters edition

javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013

bq cli

bq load --nosynchronous_mode --encoding UTF-8 --field_delimiter 'tab' --max_bad_records 100 --source_format CSV api.stats 20131014T11-42-05Z.gz

Page 35: API analytics with Redis and Google Bigquery. NoSQL matters edition

web console screenshot

javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013

Page 36: API analytics with Redis and Google Bigquery. NoSQL matters edition

almost SQL

javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013

selectfromjoinwheregroup byhavingorderlimit

avgcountmaxminsum

+-*/%

&|^<<>>~

=!=<>><>= <=INIS NULLBETWEEN

ANDORNOT

Page 37: API analytics with Redis and Google Bigquery. NoSQL matters edition

Functions overview

javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013

current_datecurrent_timenowdatediffdayday_of_weekday_of_yearhourminutequarteryear...

absacosatanceilfloordegreesloglog2log10PISQRT...

concatcontainsleftlengthlowerupperlpadrpadrightsubstr

Page 38: API analytics with Redis and Google Bigquery. NoSQL matters edition

analytics specific extensions

javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013

withinflattennest

stddev

topfirstlastnth

variance

var_popvar_samp

covar_popcovar_samp

quantiles

Page 39: API analytics with Redis and Google Bigquery. NoSQL matters edition

javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013

window functions

Page 40: API analytics with Redis and Google Bigquery. NoSQL matters edition

correlations

javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013

Page 41: API analytics with Redis and Google Bigquery. NoSQL matters edition

Things you always wanted to try but were too scare to

javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013

select count(*) from publicdata:samples.wikipedia

where REGEXP_MATCH(title, "[0-9]*") AND wp_namespace = 0;

223,163,387Query complete (5.6s elapsed, 9.13 GB processed, Cost: 32¢)

Page 42: API analytics with Redis and Google Bigquery. NoSQL matters edition

SELECT repository_name, repository_language, repository_description, COUNT(repository_name) as cnt,repository_urlFROM github.timelineWHERE type="WatchEvent"AND PARSE_UTC_USEC(created_at) >= PARSE_UTC_USEC("#{yesterday} 20:00:00")AND repository_url IN (

SELECT repository_urlFROM github.timelineWHERE type="CreateEvent"AND PARSE_UTC_USEC(repository_created_at) >= PARSE_UTC_USEC('#{yesterday} 20:00:00')AND repository_fork = "false"AND payload_ref_type = "repository"GROUP BY repository_url

)GROUP BY repository_name, repository_language, repository_description, repository_urlHAVING cnt >= 5ORDER BY cnt DESCLIMIT 25

Page 43: API analytics with Redis and Google Bigquery. NoSQL matters edition

javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013

country segmented traffic

Page 44: API analytics with Redis and Google Bigquery. NoSQL matters edition

javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013

our most active user

Page 45: API analytics with Redis and Google Bigquery. NoSQL matters edition

javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013

10 request we should be caching

Page 46: API analytics with Redis and Google Bigquery. NoSQL matters edition

javier ramirez @supercoco9 http://teowaki.com nosqlmatters 2013

5 most created resources

Page 47: API analytics with Redis and Google Bigquery. NoSQL matters edition

redis pricing

2* machines (master/slave) at digital ocean

$10 monthly

* we were already using these instances for a lot of redis use cases

javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013

Page 48: API analytics with Redis and Google Bigquery. NoSQL matters edition

s3 pricing

$0.095 per GB

a gzipped 1.6 MB file stores 300K rows

$0.0001541698 / monthly

javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013

Page 49: API analytics with Redis and Google Bigquery. NoSQL matters edition

glacier pricing

$0.01 per GB

$0.000016 / monthly

javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013

Page 50: API analytics with Redis and Google Bigquery. NoSQL matters edition

bigquery pricing

$80 per stored TB300000 rows => $0.007629392 / month

$35 per processed TB1 full scan = 84 MB1 count = 0 MB1 full scan over 1 column = 5.4 MB10 GB => $0.35 / month

javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013

Page 51: API analytics with Redis and Google Bigquery. NoSQL matters edition

redis $10.0000000000s3 storage $00.0001541698s3 transfer $00.0050000000

glacier transfer $00.0500000000glacier storage $00.0000160000bigquery storage $00.0076293920bigquery queries $00.3500000000

$10.41 / monthfor our first 330000 rows

javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013

Page 52: API analytics with Redis and Google Bigquery. NoSQL matters edition

1. non intrusive metrics2. keep the history3. avoid vendor lock-in4. interactive queries5. cheap6. extra ball: real time

javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013

Page 53: API analytics with Redis and Google Bigquery. NoSQL matters edition
Page 54: API analytics with Redis and Google Bigquery. NoSQL matters edition

Find related links at

https://teowaki.com/teams/javier-community/link-categories/bigquery-talk

Gr cies!à

Javier Ramírez@supercoco9

nosqlmatters 2013