api analytics redis bigquery. Lrug

javier ramirez@supercoco9

API Analytics with Redis

and Google Bigquery

javier ramirez @supercoco9 https://teowaki.com

REST API +

AngularJS web as an API client

obvious solution:

use a ready-made service as 3scale or apigee

1. non intrusive metrics2. keep the history3. avoid vendor lock-in4. interactive queries5. cheap6. extra ball: real time

data that’s an order of magnitude greater than data you’re accustomed to

Doug Laney VP Research, Business Analytics and Performance Management at Gartner

data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the structures of your database architectures.

Ed Dumbill program chair for the O’Reilly Strata Conference

bigdata is doing a fullscan to 330MM rows, matching them against a regexp, and getting the result (223MM rows) in just 5 seconds

Javier Ramirezimpresionable teowaki founder

Intel(R) Xeon(R) CPU E5520 @ 2.27GHz (with pipelining)

$ ./redis-benchmark -r 1000000 -n 2000000 -t get,set,lpush,lpop -P 16 -q

SET: 552,028 requests per secondGET: 707,463 requests per secondLPUSH: 767,459 requests per secondLPOP: 770,119 requests per second

Intel(R) Xeon(R) CPU E5520 @ 2.27GHz (without pipelining)$ ./redis-benchmark -r 1000000 -n 2000000 -t get,set,lpush,lpop -q

SET: 122,556 requests per secondGET: 123,601 requests per secondLPUSH: 136,752 requests per secondLPOP: 132,424 requests per second

open source, BSD licensed, advanced key-value store. It is often referred to as a data structure server since keys can contain strings, hashes, lists, sets and sorted sets.

http://redis.io

started in 2009 by Salvatore Sanfilippo @antirez

100 contributors at https://github.com/antirez/redis

what is it used for

twitter

Every time line (800 tweets per user) is stored in redis

5000 writes per second avg300K reads per second

nginx + lua + redis

apache + mruby + redis

Redis keeps

everything in memory all the time

easy: store GZIPPED files into S3/Glacier

* we are moving to google cloud now

Hadoop (map/reduce)

http://hadoop.apache.org/

started in 2005 by Doug Cutting and Mike Cafarella

cassandra

http://cassandra.apache.org/

released in 2008 by facebook.

other big data solutions:

hadoop+voldemort+kafka

http://engineering.linkedin.com/projects

http://hbase.apache.org/

Amazon Redshift

http://aws.amazon.com/redshift/

Our choice:

google bigquery

Data analysis as a service

http://developers.google.com/bigquery

Based on Dremel

Specifically designed for interactive queries over petabytes of real-time data

Columnar storage

Easy to compress

Convenient for querying long series over a single column

loading data

You can feed flat CSV files or nested JSON objects

bq cli

bq load --nosynchronous_mode --encoding UTF-8 --field_delimiter 'tab' --max_bad_records 100 --source_format CSV api.stats 20131014T11-42-05Z.gz

web console screenshot

almost SQL

selectfromjoinwheregroup byhavingorderlimit

avgcountmaxminsum

&|^<<>>~

=!=<>><>= <=INIS NULLBETWEEN

ANDORNOT

Functions overview

current_datecurrent_timenowdatediffdayday_of_weekday_of_yearhourminutequarteryear...

absacosatanceilfloordegreesloglog2log10PISQRT...

concatcontainsleftlengthlowerupperlpadrpadrightsubstr

analytics specific extensions

withinflattennest

stddev

topfirstlastnth

variance

var_popvar_samp

covar_popcovar_samp

quantiles

window functions

correlations

Things you always wanted to try but were too scare to

select count(*) from publicdata:samples.wikipedia

where REGEXP_MATCH(title, "[0-9]*") AND wp_namespace = 0;

223,163,387Query complete (5.6s elapsed, 9.13 GB processed, Cost: 32¢)

SELECT repository_name, repository_language, repository_description, COUNT(repository_name) as cnt,repository_urlFROM github.timelineWHERE type="WatchEvent"AND PARSE_UTC_USEC(created_at) >= PARSE_UTC_USEC("#{yesterday} 20:00:00")AND repository_url IN (

SELECT repository_urlFROM github.timelineWHERE type="CreateEvent"AND PARSE_UTC_USEC(repository_created_at) >= PARSE_UTC_USEC('#{yesterday} 20:00:00')AND repository_fork = "false"AND payload_ref_type = "repository"GROUP BY repository_url

)GROUP BY repository_name, repository_language, repository_description, repository_urlHAVING cnt >= 5ORDER BY cnt DESCLIMIT 25

country segmented traffic

our most active user

10 request we should be caching

javier ramirez @supercoco9 http://teowaki.com

5 most created resources

redis pricing

2* machines (master/slave) at digital ocean

$10 monthly

* we were already using these instances for a lot of redis use cases

s3 pricing

$0.095 per GB

a gzipped 1.6 MB file stores 300K rows

$0.0001541698 / monthly

glacier pricing

$0.01 per GB

$0.000016 / monthly

bigquery pricing

$80 per stored TB300000 rows => $0.007629392 / month

$35 per processed TB1 full scan = 84 MB1 count = 0 MB1 full scan over 1 column = 5.4 MB10 GB => $0.35 / month

redis $10.0000000000s3 storage $00.0001541698s3 transfer $00.0050000000

glacier transfer $00.0500000000glacier storage $00.0000160000bigquery storage $00.0076293920bigquery queries $00.3500000000

$10.41 / monthfor our first 330000 rows

1. non intrusive metrics2. keep the history3. avoid vendor lock-in4. interactive queries5. cheap6. extra ball: real time

Find related links at

https://teowaki.com/teams/javier-community/link-categories/bigquery-talk

Cheers!

Javier Ramírez@supercoco9

api analytics redis bigquery. Lrug

Technology

Redis in Kenshoo microservices - Redis Labs · Redis in Kenshoo microservices Jesque based delayed priority queue system Presented by Dima Vizelman. A Redis Use-Case Performance Aggregation

Data Tiering: Squeezing Scale out of MySQL (LRUG Presentation 2014-01-13)

How BigQuery broke my heart

Redshift VS BigQuery

How to Maximize your BigQuery

Redis Day Keynote Salvatore Sanfillipo Redis Labs

API Analytics with Redis and Google Bigquery · API Analytics with Redis and Google Bigquery. ... cassandra javier ramirez @supercoco9 https: ... almost SQL javier ramirez

Impala and BigQuery

BigQuery in Social Gaming

Big Data at BBVA Research using BigQuery · Big Data at BBVA Research using BigQuery What is GDELT and how BigQuery helps us to exploit it Our products Political, Geopolitical Social

BigQuery implementation

Exploring BigData with Google BigQuery

BigQuery Connector for Qlik Sense User Guide Guide - BigQuery... · The BigQuery connector for Qlik Sense has two modes, direct and fast. •Direct mode is intended for the most use

BigQuery Architecture

data == code | LRUG April 2008

Hadoop Conf 2014 - Hadoop BigQuery Connector

Amazon · Amazon ElastiCache for Redis ElastiCache for Redis User Guide Table of Contents What Is ElastiCache for Redis

Redis Introduction

Google BigQuery - Meetupfiles.meetup.com/5930972/DataHackers Google BigQuery 11-18-2014.pdf · •Open the Google APIs Console to your BigQuery project •Enable Cloud Storage on

Itty bittypresentation lrug