42
Providing Click Predictions in Real-Time at Scale Chris Evans July 19, 2016 (chris.evans@adroll .com)

(2016 07-19) providing click predictions in real-time at scale

Embed Size (px)

Citation preview

Page 1: (2016 07-19) providing click predictions in real-time at scale

Providing Click Predictions in Real-Time at ScaleChris Evans

July 19, 2016

([email protected])

Page 2: (2016 07-19) providing click predictions in real-time at scale

I. Overview of the Problem

Page 3: (2016 07-19) providing click predictions in real-time at scale

Real-Time BiddingYou visit a webpage:● (< 100ms) Real-Time auction● Winner gets to show you their ad

AdRoll needs to bid intelligently:● Bid more when more likely to click/convert● Don’t show ads to bots● Strategically deal with auction dynamics

Page 4: (2016 07-19) providing click predictions in real-time at scale

A First Bidding Strategy

Bid Value = vCPM

● vCPM = “Value of Impression” (constant)● Gets the most impressions (ads) for your money

…but most customers care about clicks/conversions

Page 5: (2016 07-19) providing click predictions in real-time at scale

Bidding Strategy For Clicks

Bid Value = (vCPC) * (pCTR)

● vCPC = “Value of Click” (constant)● pCTR (predicted Click Through Rate) =

probability of click● Gets the most clicks for your money

Page 6: (2016 07-19) providing click predictions in real-time at scale

Computing pCTR

Training a model sounds easy -- It’s just a binary classification problem!

1 | country=US,browser=Firefox,operating_system=Windows,...0 | country=FI,browser=Chrome,operating_system=Kali Linux,...1 | country=UK,browser=Safari,operating_system=iOS,...

click/no-click features

Not so easy at scale...

Page 7: (2016 07-19) providing click predictions in real-time at scale

II. Issues and Solutions at Scale

Page 8: (2016 07-19) providing click predictions in real-time at scale

Issues

Scale● TBs of impression and click logs

Real-Time● Sub-100ms window to return prediction● Models need to be fresh● Need automated infrastructure to update

models

Page 9: (2016 07-19) providing click predictions in real-time at scale

Solutions

Downsampling● Many more negative labels than positive● Train over all positive samples but only a

fraction of negative samples● Scale predictions accordingly at serve time

Page 10: (2016 07-19) providing click predictions in real-time at scale

SolutionsFeatures Graph/Feature Hashing

User Agent Timestamp Country

OS Hour

Hour x Country

Raw Features

3df53a24 ac7673b2 a2b42465 6291d2e1Hashed Features(32 bits here)

Page 11: (2016 07-19) providing click predictions in real-time at scale

SolutionsAdvantages● Greatly reduces dimensionality● Store hashed-feature weights in an array for

easy random access

Page 12: (2016 07-19) providing click predictions in real-time at scale

Solutions

Languages● Use low level language for computation: D● Use scripting language for glue: Python

AWS Services● EC2 instances to train models● S3 for general shared storage

Page 13: (2016 07-19) providing click predictions in real-time at scale

Basic Infrastructure

Raw Imps

Raw Clicks

Joined Imp-Cli pCTR Model

Joiner Learner

S3

EC2 EC2

BidderEC2

Page 14: (2016 07-19) providing click predictions in real-time at scale

III. Assessing and Iterating

Page 15: (2016 07-19) providing click predictions in real-time at scale

Assessing Prediction AccuracyChoice of Metric● We care about exact probabilities => Average logistic loss

more important than AUC● In practice average log loss and AUC are highly correlated

anyway● We also examine our average prediction vs. observed CTR on

subsets of traffic

Page 16: (2016 07-19) providing click predictions in real-time at scale

Assessing Prediction AccuracyMathematical Theory Perfect Predictions => Avg Prediction == Observed CTR (on each subset of traffic)

Conversely, Avg Prediction == Observed CTR (on every subset of traffic) => Perfect Predictions

In particular e.g. the subset “traffic where we predicted a 1% CTR”.

Page 17: (2016 07-19) providing click predictions in real-time at scale

Assessing Prediction AccuracyVisualization (Using R Shiny)

Page 18: (2016 07-19) providing click predictions in real-time at scale

Iterating on the ModelBacktesting Webapp (Python Flask)● Launch EC2 worker instances● Train/Test experimental models on the workers● Accumulate experiment metrics● Provide visual comparison

Page 19: (2016 07-19) providing click predictions in real-time at scale

Iterating on the ModelBacktesting Webapp (Python Flask)

Page 20: (2016 07-19) providing click predictions in real-time at scale

Iterating on the ModelBacktesting Webapp (Python Flask)

REDACTED

Page 21: (2016 07-19) providing click predictions in real-time at scale

Iterating on the ModelLive A/B Testing● Train and deploy a candidate predictor● Shard traffic by cookie. Send fixed % to the

candidate predictor

Page 22: (2016 07-19) providing click predictions in real-time at scale

Iterating on the ModelA/B Testing Results (R Shiny)

REDACTED

Page 23: (2016 07-19) providing click predictions in real-time at scale

Infrastructure

Raw ImpsRaw

Clicks

Joined Imp-ClipCTR Model A

Joiner Learner A

Bidder

pCTR Model B

Learner B

Click Pred App

A/B Results App

Backtesting App

Page 24: (2016 07-19) providing click predictions in real-time at scale

IV. Choice of Model/Optimizer

Page 25: (2016 07-19) providing click predictions in real-time at scale

Model for Click Prediction● Originally used Logistic Regression● Upgraded to Factorization Machines

http://tech.adroll.com/blog/data-science/2015/08/25/factorization-machines.html1

(Check out blog post by Matt Wilson)1

Page 26: (2016 07-19) providing click predictions in real-time at scale

Model for Click Prediction

Factorization Machines

Bid Value● x_i are binary (hashed) features● w_i are direct weights● v_i are k-dim embedding of features● f is the sigmoid function

Page 27: (2016 07-19) providing click predictions in real-time at scale

Model for Click Prediction

Factorization Machines

Complexity looks O(n^2) but is only O(kn):

Page 28: (2016 07-19) providing click predictions in real-time at scale

Optimizer for Click PredictionWe’ve experimented with a variety of optimizers● Stochastic Dual-Coordinate Ascent (SDCA)● Stochastic Gradient Descent (SGD)● LBFGS

Page 29: (2016 07-19) providing click predictions in real-time at scale

Optimizer for Click PredictionWe currently use Hogwild SGD ● Performs SGD in parallel over multiple cores● One shared set of weights● Each core reads training samples and updates

the shared set of weights● Ignore that weights may change between read

and write -- no locks, no coordination!

Page 30: (2016 07-19) providing click predictions in real-time at scale

V. Beyond Click Prediction

Page 31: (2016 07-19) providing click predictions in real-time at scale

Beyond Click Prediction

We can reuse this infrastructure to train additional predictors

Page 32: (2016 07-19) providing click predictions in real-time at scale

Bidding Strategy For Conversions

Bid Value = (vCPA) * (pCTR) * (pPCC)

● vCPA = “Value of Conversion” (constant)● pCTR = probability of click● pPCC (post click conversion) = prob convert

given click ● Gets the most click-through-conversions for

your money

Page 33: (2016 07-19) providing click predictions in real-time at scale

Post Click Conversion Prediction

Additional Challenges (vs. Click Pred)● Conversion definition varies by advertiser● Many fewer conversions to train on ● Conversions may occur up to 30 days after the

click● Subjectivity in attributing conversion to click

Page 34: (2016 07-19) providing click predictions in real-time at scale

Post Click Conversion Prediction

Joining Clicks to Conversions● We use Trail DB which stores event history by cookie● Iterate through 30 days of Trail DBs● A click followed by a conversion is given a positive label● A click followed by another click or end-of-trail is given a

negative label

http://traildb.io/1

1

Page 35: (2016 07-19) providing click predictions in real-time at scale

Post Click Conversion Prediction

imp cli imp cli conv imp cli imp

Label 0 Label 1 Label 0

Joining Clicks to Conversions

Cookie Trail:

Page 36: (2016 07-19) providing click predictions in real-time at scale

Post Click Conversion Prediction

Joining Clicks to Conversions● Clicks toward the end are more likely labelled 0● We can fix bias with 30 day look-ahead window…● ...but this is undesirable (requires additional 30 day wait to

accumulate data)● Instead we use a Delayed Feedback Model

Chapelle, Olivier. "Modeling delayed feedback in display advertising." Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining 24 Aug. 2014: 1097-1105.

1

1

Page 37: (2016 07-19) providing click predictions in real-time at scale

Post Click Conversion Prediction

Delayed Feedback Model● Label clicks naively (i.e. label 0 even when no time to

convert)● Train an auxiliary model to predict click-conversion

delay● Weight negative instances accordingly

Page 38: (2016 07-19) providing click predictions in real-time at scale

Utility Bidding

Auction dynamics vary by Ad Exchange

Simplest is a Second Price Auction● Winner pays the price of the second highest bid● Incentive to bid your true value

Page 39: (2016 07-19) providing click predictions in real-time at scale

Utility BiddingBut many exchanges aren’t

Win price appears to be function of bid price

Page 40: (2016 07-19) providing click predictions in real-time at scale

Utility BiddingAuctions can also include● Hard Floors (Cannot win if bid below)● Soft Floors (Pay first price if bid below)● Multistage Auctions ● And many more!

Page 41: (2016 07-19) providing click predictions in real-time at scale

Utility BiddingFor such auctions we use additional predictors ● Probability we win the auction● Probability auction is first price

Page 42: (2016 07-19) providing click predictions in real-time at scale

Questions?