25
An overview of Hulu’s metrics platform Tristan Reid [email protected] om Prasan Samtani prasan.samtani@hulu .com

An overview of Hulu’s metrics platform

  • Upload
    chesmu

  • View
    135

  • Download
    0

Embed Size (px)

DESCRIPTION

An overview of Hulu’s metrics platform. Tristan Reid [email protected]. Prasan Samtani [email protected]. What we do. Streaming video service > 5.5 million subscribers > 20 million unique visitors/month > 1 billion ads/month. It all begins with beacons. Living room device - PowerPoint PPT Presentation

Citation preview

Page 1: An overview of  Hulu’s  metrics platform

An overview of Hulu’s metrics platform

Tristan Reid [email protected]

Prasan [email protected]

Page 2: An overview of  Hulu’s  metrics platform

What we do

• Streaming video service• > 5.5 million subscribers• > 20 million unique

visitors/month• > 1 billion ads/month

Page 3: An overview of  Hulu’s  metrics platform

It all begins with beacons

Living room device(Roku, Xbox, etc)

Mobile device(Android, iPhone,

etc)

Web(hulu.com)

Beacon collection service

Page 4: An overview of  Hulu’s  metrics platform

What’s in a beacon

80 2013-04-01 00:00:00/v3/playback/start?bitrate=650&cdn=Akamai&channel=Anime&clichéent=Explorer&computerguid=EA8FA1000232B8F6986C3E0BE55E9333&contentid=5003673…

Page 5: An overview of  Hulu’s  metrics platform

Reporting platform (RP2)Find Metrics & Dimensions

Design and execute reports

Page 6: An overview of  Hulu’s  metrics platform

The pipeline

Devices Beacon collection serviceDevices

Devices

HDFS

Hive

RDBMS

LogCollector/Flume

MapReduce jobs/JobScheduler

Harpy – continuous aggregation

Reporting(RP2)

Monitoring(metstat)

Developers

Business

Page 7: An overview of  Hulu’s  metrics platform

HDFSFiles bucketed by beacon

type and partitioned by hour

Log Collection machine #1

Log Collection

Load balancer

DevicesDevicesDevices

Log Collection machine

#11

Page 8: An overview of  Hulu’s  metrics platform

Directory hierarchy on HDFS

/user/hadoop/t2

201401010000/

playback/

201401010100_playback_1.se

q

201401010100_playback_2.se

q

…revenue/

201401010100

playback/

revenue/

Page 9: An overview of  Hulu’s  metrics platform

MapReduce - going from beacons to basefacts

computerguid EA8FA1000232B8F6986C3E0BE55E9333

userid 5238518video_id 289696content_partner_id 398distribution_partner_id 602distro_platform_id 14is_on_hulu 0…hourid 383149watched 76426

Page 10: An overview of  Hulu’s  metrics platform

If a program manipulates a large amount of data, it does so in a small number of ways- Alan Perlis

Page 11: An overview of  Hulu’s  metrics platform

The BeaconSpec compiler

Definitions of beacons and

base-facts

Beaconspec compiler

Java MapReduce

code that can run on the

cluster

Page 12: An overview of  Hulu’s  metrics platform

What does our language look like?

basefact playback_watched_uniques from playback/(position|end) { dimension harpyhour.id as hourid; dimension computerguid as computerguid; dimension userid as userid; required dimension video.id as video_id; required dimension contentPartner.id as content_partner_id; …

dimension siteSessionId.chosen as site_session_id; dimension facebook.isfacebookconnected as is_facebook_connected; fact sum(watched.out) as watched;}

FAQ: Why didn’t we just use Pig?

Page 13: An overview of  Hulu’s  metrics platform

The superior [program] cultivates itself so as to give rest to [programmers]- Confucius, the Way of the Superior Man

Page 14: An overview of  Hulu’s  metrics platform

Scheduling jobs

JobScheduler Interface

Outside world

Logmanager databases

JobScheduler

Checks databases for jobs that are ready to

run and whether dependencies are met

JobMonitorMapReduce

job

JobMonitorMapReduce

job

JobMonitorMapReduce

job

Page 15: An overview of  Hulu’s  metrics platform

JobScheduler technology

• The actor model of concurrency– Communication through async messaging– Completely encapsulated state

Page 16: An overview of  Hulu’s  metrics platform

Actor creation

Message passing

Central idea: Treat local objects as if they are distributed, as opposed to treating distributed objects as if they are local

Page 17: An overview of  Hulu’s  metrics platform

Fault-tolerance – let it crash!

Page 18: An overview of  Hulu’s  metrics platform

Harpy – continuous aggregations

HDFS NFS

Metadata

Output DBs

Harpy

DataSync

Publishing

HoldingDB

HoldingSweeper Agg

Scheduler

Queue Processor

Hive

Page 19: An overview of  Hulu’s  metrics platform

RP2

• Reporting Portal for pulling Metrics + Dimensions

• Quick ‘Demo’

Page 20: An overview of  Hulu’s  metrics platform

Let’s Reexamine the pipeline:

Devices Beacon collection serviceDevices

Devices

HDFS

Hive

RDBMS

LogCollector/Flume

MapReduce jobs/JobScheduler

Harpy – continuous aggregation

Reporting(RP2)

Monitoring(metstat)

Developers

Business

Page 21: An overview of  Hulu’s  metrics platform
Page 22: An overview of  Hulu’s  metrics platform

Metstat

• Python Django App• Tasks on Celery + RabbitMQ• JQuery• Tracks status, status changes and statistics• Gets data directly from various sources

(databases, HDFS)

Page 23: An overview of  Hulu’s  metrics platform

FAQ: Why didn’t we just use Pig?

• Dataflow language – runs on Hadoop• Pig philosophy – (Taken from the Apache website)– Pigs eat anything– Pigs live anywhere– Pigs are domestic animals– Pigs fly

Beaconspec

Page 24: An overview of  Hulu’s  metrics platform

Beware of the Turing tar-pit where everything is possible but nothing of interest is easy - Alan Perlis

REGISTER ./tutorial.jar; raw = LOAD 'excite.log' USING PigStorage('\t') AS (user, time, query);clean1 = FILTER raw BY org.apache.pig.tutorial.NonURLDetector(query);clean2 = FOREACH clean1 GENERATE user, time, org.apache.pig.tutorial.ToLower(query) as query;

Beaconspec

Page 25: An overview of  Hulu’s  metrics platform

FAQ: What is open sourced?

• Slickint – database interface generation for Scala– github.com/zenbowman/slickint

• Local filesystem caching for hadoop– github.com/ZenBowman/luna