API analytics with Bigquery, by Javier Ramirez from teowaki

javier ramirez@supercoco9

API Analytics withRedis, BigQuery, and AppsScripts

a two peoplestart-up

a different league...

.. or maybe not

moral of the story

you can do big, if you know how

Set a distance.

Set an expiration time.

Bye bye noise.

javier ramirez @supercoco9 https://teowaki.com

REST API (Ruby on Rails) +

Web on top (AngularJS)

data that’s an order of magnitude greater than data you’re accustomed to

Doug Laney VP Research, Business Analytics and Performance Management at Gartner

data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the structures of your database architectures.

Ed Dumbill program chair for the O’Reilly Strata Conference

bigdata is doing a fullscan to 330MM rows, matching them against a regexp, and getting the result (223MM rows) in just 5 seconds

Javier Ramirezimpresionable teowaki founder

1. non intrusive metrics2. keep the history3. avoid vendor lock-in4. interactive queries5. cheap6. extra ball: real time

twitterstackoverflowpinterestbooking.comWorld of WarcraftYouPornHipChatSnapchat

ntopngLogStash

Non intrusive metrics

Capture data really fast.

Then process the data on the background

Gzip to AWS S3/Glacier

orGoogle Cloud Storage

HadoopCassandraHadoop + Voldemort + KafkaHBase…Amazon Redshift

tools we considered:

but...

hard to set up and monitor

not interactive enough

expensive cluster

Our choice:

Google BigQuery

Data analysis as a service

http://developers.google.com/bigquery

Based on “Dremel”

Specifically designed for interactive queries over petabytes of real-time data

loading data

You just send the data intext (or JSON) format

select name from USERS order by date;

select count(*) from users;

select max(date) from USERS;

select sum(total) from ORDERS group by user;

specific extensions for analytics

withinflattennest

stddev

topfirstlastnth

variance

var_popvar_samp

covar_popcovar_samp

quantiles

correlations

Things you always wanted to try but were too scared to

select count(*) from publicdata:samples.wikipedia

where REGEXP_MATCH(title, "[0-9]*") AND wp_namespace = 0;

223,163,387Query complete (5.6s elapsed, 9.13 GB processed, Cost: 32¢)

columnar storage

highly distributed execution using a tree

web console screenshot

country segmented traffic

window functions

our most active user

new users per month

10 request we should be caching

javier ramirez @supercoco9 http://teowaki.com

5 most created resources

select uri, count(*) total from stats where method = 'POST' group by URI;

...but

/users/javier/shouts/users/rgo/shouts/teams/javier-community/links/teams/nosqlmatters-cgn/links

5 most created resources

SELECT repository_name, repository_language, repository_description, COUNT(repository_name) as cnt,repository_urlFROM github.timelineWHERE type="WatchEvent"AND PARSE_UTC_USEC(created_at) >= PARSE_UTC_USEC("#{yesterday} 20:00:00")AND repository_url IN (

SELECT repository_urlFROM github.timelineWHERE type="CreateEvent"AND PARSE_UTC_USEC(repository_created_at) >= PARSE_UTC_USEC('#{yesterday} 20:00:00')AND repository_fork = "false"AND payload_ref_type = "repository"GROUP BY repository_url

)GROUP BY repository_name, repository_language, repository_description, repository_urlHAVING cnt >= 5ORDER BY cnt DESCLIMIT 25

Automation with Apps Script

Read from bigquery

Create a spreadsheet on Drive

E-mail it everyday as a PDF

bigquery pricing

$26 per stored TB1000000 rows => $0.00416 / month

£0.00243 / month

$5 per processed TB1 full scan = 160 MB1 count = 0 MB1 full scan over 1 column = 5.4 MB100 GB => $0.05 / month £0.03

£0.054307 / month*

per 1MM rows

*the 1st 1TB every month is free of charge

1. non intrusive metrics2. keep the history3. avoid vendor lock-in4. interactive queries5. cheap6. extra ball: real time

Find related links at

https://teowaki.com/teams/javier-community/link-categories/bigquery-talk

Thanks!תודה

Javier Ramírez@supercoco9

API analytics with Bigquery, by Javier Ramirez from teowaki

Technology

Hadoop Conf 2014 - Hadoop BigQuery Connector

Amazon Redshift to BigQuery SQL translation reference

(Almost) Serverless Analytics System with BigQuery & AppEngine

Impala and BigQuery (1)

Machine learning para tertulianos, by javier ramirez at teowaki

BigQuery Architecture

Google BigQuery & Tableau: Best Practices · Google BigQuery Tableau Bes Pracices 2 Introduction Tableau and Google BigQuery allows people to analyze massive amounts of data and get

Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Conference 2014

Beyond GA360 & BigQuery

Google Cloud Platform for DeVops, by Javier Ramirez @ teowaki

Building Usable REST APIs. By Javier Ramirez, teowaki. FOWA London

Big Data at BBVA Research using BigQuery · Big Data at BBVA Research using BigQuery What is GDELT and how BigQuery helps us to exploit it Our products Political, Geopolitical Social

Get more from Analytics with Google BigQuery - Javier Ramirez - Datawaki- BBVACI

Exploring BigData with Google BigQuery

Streaming analytics on Google Cloud Platform, by Javier Ramirez, teowaki

Column Stores and Google BigQuery

teowaki presentation for Startup in Action contest in Codemotion Rome 2014

BigQuery in Social Gaming

Google BigQuery

Google BigQuery - Introdução