17
Redshift for Raw Data Analysis @ QuizUp Stefanía Bjarney Ólafsdóttir // Head of Data Science stefaniabje

Redshift for Raw Data Analysis @ QuizUp · 4/29/2015  · • RedShift • iPython Notebook server • Random Forest Classifier from scikit-learn • Method: • Aggregate data into

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Redshift for Raw Data Analysis @ QuizUp · 4/29/2015  · • RedShift • iPython Notebook server • Random Forest Classifier from scikit-learn • Method: • Aggregate data into

Redshift for Raw Data Analysis @ QuizUp

Stefanía Bjarney Ólafsdóttir // Head of Data Science stefaniabje

Page 2: Redshift for Raw Data Analysis @ QuizUp · 4/29/2015  · • RedShift • iPython Notebook server • Random Forest Classifier from scikit-learn • Method: • Aggregate data into

What is QuizUp?

Page 3: Redshift for Raw Data Analysis @ QuizUp · 4/29/2015  · • RedShift • iPython Notebook server • Random Forest Classifier from scikit-learn • Method: • Aggregate data into

stefaniabjeQuizUp

What is Quizup?• The biggest trivia game in the world! • 1M registered members in first week • 32M+ registered members • 600K questions in 1000+ topics in 6 languages • 27 trillion (no joke) questions answered

Page 4: Redshift for Raw Data Analysis @ QuizUp · 4/29/2015  · • RedShift • iPython Notebook server • Random Forest Classifier from scikit-learn • Method: • Aggregate data into

Data Science @

QuizUp

Page 5: Redshift for Raw Data Analysis @ QuizUp · 4/29/2015  · • RedShift • iPython Notebook server • Random Forest Classifier from scikit-learn • Method: • Aggregate data into

stefaniabjeQuizUp

Clients

In-house Dashboards

One-off Reports

Reusable Reports

Mixpanel

Amplitude

Operational PostgreSQL

shards

AWS RedShift

PresentationMain Data Sources

Data Stack

Page 6: Redshift for Raw Data Analysis @ QuizUp · 4/29/2015  · • RedShift • iPython Notebook server • Random Forest Classifier from scikit-learn • Method: • Aggregate data into

stefaniabjeQuizUp

Tools & Data Sources

PostgreSQL

iPython Notebook

Python’s Pandas

R

bash

Python

RedShift

Mixpanel

AppAnnie

Flurry

Cloudwatch

FacebookJenkins

Freshdesk

Tools / Data Storages External tracking services

Fiksu

Twitter

HockeyApp

AppsFlyer

Amplitude

Page 7: Redshift for Raw Data Analysis @ QuizUp · 4/29/2015  · • RedShift • iPython Notebook server • Random Forest Classifier from scikit-learn • Method: • Aggregate data into

How is RedShift useful

Page 8: Redshift for Raw Data Analysis @ QuizUp · 4/29/2015  · • RedShift • iPython Notebook server • Random Forest Classifier from scikit-learn • Method: • Aggregate data into

stefaniabjeQuizUp

Random Forest

Page 9: Redshift for Raw Data Analysis @ QuizUp · 4/29/2015  · • RedShift • iPython Notebook server • Random Forest Classifier from scikit-learn • Method: • Aggregate data into

stefaniabjeQuizUp

Random Forest• Utils:

• RedShift • iPython Notebook server • Random Forest Classifier from scikit-learn

• Method: • Aggregate data into “helper” tables in RedShift

• All first games and their properties • All retained users and their properties

• Transform user properties to boolean variables • Run Classifier to identify key factors for Day 1 Retention

Page 10: Redshift for Raw Data Analysis @ QuizUp · 4/29/2015  · • RedShift • iPython Notebook server • Random Forest Classifier from scikit-learn • Method: • Aggregate data into

stefaniabjeQuizUp

Factors Impacting Return Day After First Game*

Facebook Connected

Friend Opponent

Ghost Opponent

Has Friends

Has QuizUp Friends

Match Was Live

Got Challenged

Opponent Was Random 1%

1%

2%

3%

3%

5%

6%

10%

*Fak

e da

ta!

Page 11: Redshift for Raw Data Analysis @ QuizUp · 4/29/2015  · • RedShift • iPython Notebook server • Random Forest Classifier from scikit-learn • Method: • Aggregate data into

stefaniabjeQuizUp

Custom Aggregations

Page 12: Redshift for Raw Data Analysis @ QuizUp · 4/29/2015  · • RedShift • iPython Notebook server • Random Forest Classifier from scikit-learn • Method: • Aggregate data into

stefaniabjeQuizUp

Quick Access to game / user details

• Game table

• Join game_started with game_ended

• Add useful stats to each game entry • E.g. number of chats until given game

• Player table • Chats

• Posts

• Games

• Follows

• …

Page 13: Redshift for Raw Data Analysis @ QuizUp · 4/29/2015  · • RedShift • iPython Notebook server • Random Forest Classifier from scikit-learn • Method: • Aggregate data into

stefaniabjeQuizUp

Dedicated Members – BFFs

• Experimented with “Four-by-Four” • User is active “daily” if she visits four days a week for four

consecutive weeks

• … try a bunch of period sizes / minimum counts

• Do Random Forrest Classification for Four-by-Four users

Page 14: Redshift for Raw Data Analysis @ QuizUp · 4/29/2015  · • RedShift • iPython Notebook server • Random Forest Classifier from scikit-learn • Method: • Aggregate data into

stefaniabjeQuizUp

Topics Dashboards

Page 15: Redshift for Raw Data Analysis @ QuizUp · 4/29/2015  · • RedShift • iPython Notebook server • Random Forest Classifier from scikit-learn • Method: • Aggregate data into

stefaniabjeQuizUp

Time in Game

*Fak

e da

ta!

Page 16: Redshift for Raw Data Analysis @ QuizUp · 4/29/2015  · • RedShift • iPython Notebook server • Random Forest Classifier from scikit-learn • Method: • Aggregate data into

stefaniabjeQuizUp

All Stats – All Topics – 12 Minutes

• Stats for <100M games in 1200 topics last 30 days • Countries • Gender • … • Time in game

Page 17: Redshift for Raw Data Analysis @ QuizUp · 4/29/2015  · • RedShift • iPython Notebook server • Random Forest Classifier from scikit-learn • Method: • Aggregate data into

Ask me anything!