(2016 07-19) providing click predictions in real-time at scale

Providing Click Predictions in Real-Time at ScaleChris Evans

July 19, 2016

([email protected])

I. Overview of the Problem

Real-Time BiddingYou visit a webpage:● (< 100ms) Real-Time auction● Winner gets to show you their ad

AdRoll needs to bid intelligently:● Bid more when more likely to click/convert● Don’t show ads to bots● Strategically deal with auction dynamics

A First Bidding Strategy

Bid Value = vCPM

● vCPM = “Value of Impression” (constant)● Gets the most impressions (ads) for your money

…but most customers care about clicks/conversions

Bidding Strategy For Clicks

Bid Value = (vCPC) * (pCTR)

● vCPC = “Value of Click” (constant)● pCTR (predicted Click Through Rate) =

probability of click● Gets the most clicks for your money

Computing pCTR

Training a model sounds easy -- It’s just a binary classification problem!

1 | country=US,browser=Firefox,operating_system=Windows,...0 | country=FI,browser=Chrome,operating_system=Kali Linux,...1 | country=UK,browser=Safari,operating_system=iOS,...

click/no-click features

Not so easy at scale...

II. Issues and Solutions at Scale

Issues

Scale● TBs of impression and click logs

Real-Time● Sub-100ms window to return prediction● Models need to be fresh● Need automated infrastructure to update

models

Solutions

Downsampling● Many more negative labels than positive● Train over all positive samples but only a

fraction of negative samples● Scale predictions accordingly at serve time

SolutionsFeatures Graph/Feature Hashing

User Agent Timestamp Country

OS Hour

Hour x Country

Raw Features

3df53a24 ac7673b2 a2b42465 6291d2e1Hashed Features(32 bits here)

SolutionsAdvantages● Greatly reduces dimensionality● Store hashed-feature weights in an array for

easy random access

Solutions

Languages● Use low level language for computation: D● Use scripting language for glue: Python

AWS Services● EC2 instances to train models● S3 for general shared storage

Basic Infrastructure

Raw Imps

Raw Clicks

Joined Imp-Cli pCTR Model

Joiner Learner

S3

EC2 EC2

BidderEC2

III. Assessing and Iterating

Assessing Prediction AccuracyChoice of Metric● We care about exact probabilities => Average logistic loss

more important than AUC● In practice average log loss and AUC are highly correlated

anyway● We also examine our average prediction vs. observed CTR on

subsets of traffic

Assessing Prediction AccuracyMathematical Theory Perfect Predictions => Avg Prediction == Observed CTR (on each subset of traffic)

Conversely, Avg Prediction == Observed CTR (on every subset of traffic) => Perfect Predictions

In particular e.g. the subset “traffic where we predicted a 1% CTR”.

Assessing Prediction AccuracyVisualization (Using R Shiny)

Iterating on the ModelBacktesting Webapp (Python Flask)● Launch EC2 worker instances● Train/Test experimental models on the workers● Accumulate experiment metrics● Provide visual comparison

Iterating on the ModelBacktesting Webapp (Python Flask)

Iterating on the ModelBacktesting Webapp (Python Flask)

REDACTED

Iterating on the ModelLive A/B Testing● Train and deploy a candidate predictor● Shard traffic by cookie. Send fixed % to the

candidate predictor

Iterating on the ModelA/B Testing Results (R Shiny)

REDACTED

Infrastructure

Raw ImpsRaw

Clicks

Joined Imp-ClipCTR Model A

Joiner Learner A

Bidder

pCTR Model B

Learner B

Click Pred App

A/B Results App

Backtesting App

IV. Choice of Model/Optimizer

Model for Click Prediction● Originally used Logistic Regression● Upgraded to Factorization Machines

http://tech.adroll.com/blog/data-science/2015/08/25/factorization-machines.html1

(Check out blog post by Matt Wilson)1

Model for Click Prediction

Factorization Machines

Bid Value● x_i are binary (hashed) features● w_i are direct weights● v_i are k-dim embedding of features● f is the sigmoid function

Model for Click Prediction

Factorization Machines

Complexity looks O(n^2) but is only O(kn):

Optimizer for Click PredictionWe’ve experimented with a variety of optimizers● Stochastic Dual-Coordinate Ascent (SDCA)● Stochastic Gradient Descent (SGD)● LBFGS

Optimizer for Click PredictionWe currently use Hogwild SGD ● Performs SGD in parallel over multiple cores● One shared set of weights● Each core reads training samples and updates

the shared set of weights● Ignore that weights may change between read

and write -- no locks, no coordination!

V. Beyond Click Prediction

Beyond Click Prediction

We can reuse this infrastructure to train additional predictors

Bidding Strategy For Conversions

Bid Value = (vCPA) * (pCTR) * (pPCC)

● vCPA = “Value of Conversion” (constant)● pCTR = probability of click● pPCC (post click conversion) = prob convert

given click ● Gets the most click-through-conversions for

your money

Post Click Conversion Prediction

Additional Challenges (vs. Click Pred)● Conversion definition varies by advertiser● Many fewer conversions to train on ● Conversions may occur up to 30 days after the

click● Subjectivity in attributing conversion to click


Joining Clicks to Conversions● We use Trail DB which stores event history by cookie● Iterate through 30 days of Trail DBs● A click followed by a conversion is given a positive label● A click followed by another click or end-of-trail is given a

negative label

http://traildb.io/1

1


imp cli imp cli conv imp cli imp

Label 0 Label 1 Label 0

Joining Clicks to Conversions

Cookie Trail:


Joining Clicks to Conversions● Clicks toward the end are more likely labelled 0● We can fix bias with 30 day look-ahead window…● ...but this is undesirable (requires additional 30 day wait to

accumulate data)● Instead we use a Delayed Feedback Model

Chapelle, Olivier. "Modeling delayed feedback in display advertising." Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining 24 Aug. 2014: 1097-1105.

1

1


Delayed Feedback Model● Label clicks naively (i.e. label 0 even when no time to

convert)● Train an auxiliary model to predict click-conversion

delay● Weight negative instances accordingly

Utility Bidding

Auction dynamics vary by Ad Exchange

Simplest is a Second Price Auction● Winner pays the price of the second highest bid● Incentive to bid your true value

Utility BiddingBut many exchanges aren’t

Win price appears to be function of bid price

Utility BiddingAuctions can also include● Hard Floors (Cannot win if bid below)● Soft Floors (Pay first price if bid below)● Multistage Auctions ● And many more!

Utility BiddingFor such auctions we use additional predictors ● Probability we win the auction● Probability auction is first price

Questions?

Data & Analytics

(2016 07-19) providing click predictions in real-time at scale