Upload
lawrence-evans
View
419
Download
1
Embed Size (px)
Citation preview
I. Overview of the Problem
Real-Time BiddingYou visit a webpage:● (< 100ms) Real-Time auction● Winner gets to show you their ad
AdRoll needs to bid intelligently:● Bid more when more likely to click/convert● Don’t show ads to bots● Strategically deal with auction dynamics
A First Bidding Strategy
Bid Value = vCPM
● vCPM = “Value of Impression” (constant)● Gets the most impressions (ads) for your money
…but most customers care about clicks/conversions
Bidding Strategy For Clicks
Bid Value = (vCPC) * (pCTR)
● vCPC = “Value of Click” (constant)● pCTR (predicted Click Through Rate) =
probability of click● Gets the most clicks for your money
Computing pCTR
Training a model sounds easy -- It’s just a binary classification problem!
1 | country=US,browser=Firefox,operating_system=Windows,...0 | country=FI,browser=Chrome,operating_system=Kali Linux,...1 | country=UK,browser=Safari,operating_system=iOS,...
click/no-click features
Not so easy at scale...
II. Issues and Solutions at Scale
Issues
Scale● TBs of impression and click logs
Real-Time● Sub-100ms window to return prediction● Models need to be fresh● Need automated infrastructure to update
models
Solutions
Downsampling● Many more negative labels than positive● Train over all positive samples but only a
fraction of negative samples● Scale predictions accordingly at serve time
SolutionsFeatures Graph/Feature Hashing
User Agent Timestamp Country
OS Hour
Hour x Country
Raw Features
3df53a24 ac7673b2 a2b42465 6291d2e1Hashed Features(32 bits here)
SolutionsAdvantages● Greatly reduces dimensionality● Store hashed-feature weights in an array for
easy random access
Solutions
Languages● Use low level language for computation: D● Use scripting language for glue: Python
AWS Services● EC2 instances to train models● S3 for general shared storage
Basic Infrastructure
Raw Imps
Raw Clicks
Joined Imp-Cli pCTR Model
Joiner Learner
S3
EC2 EC2
BidderEC2
III. Assessing and Iterating
Assessing Prediction AccuracyChoice of Metric● We care about exact probabilities => Average logistic loss
more important than AUC● In practice average log loss and AUC are highly correlated
anyway● We also examine our average prediction vs. observed CTR on
subsets of traffic
Assessing Prediction AccuracyMathematical Theory Perfect Predictions => Avg Prediction == Observed CTR (on each subset of traffic)
Conversely, Avg Prediction == Observed CTR (on every subset of traffic) => Perfect Predictions
In particular e.g. the subset “traffic where we predicted a 1% CTR”.
Assessing Prediction AccuracyVisualization (Using R Shiny)
Iterating on the ModelBacktesting Webapp (Python Flask)● Launch EC2 worker instances● Train/Test experimental models on the workers● Accumulate experiment metrics● Provide visual comparison
Iterating on the ModelBacktesting Webapp (Python Flask)
Iterating on the ModelBacktesting Webapp (Python Flask)
REDACTED
Iterating on the ModelLive A/B Testing● Train and deploy a candidate predictor● Shard traffic by cookie. Send fixed % to the
candidate predictor
Iterating on the ModelA/B Testing Results (R Shiny)
REDACTED
Infrastructure
Raw ImpsRaw
Clicks
Joined Imp-ClipCTR Model A
Joiner Learner A
Bidder
pCTR Model B
Learner B
Click Pred App
A/B Results App
Backtesting App
IV. Choice of Model/Optimizer
Model for Click Prediction● Originally used Logistic Regression● Upgraded to Factorization Machines
http://tech.adroll.com/blog/data-science/2015/08/25/factorization-machines.html1
(Check out blog post by Matt Wilson)1
Model for Click Prediction
Factorization Machines
Bid Value● x_i are binary (hashed) features● w_i are direct weights● v_i are k-dim embedding of features● f is the sigmoid function
Model for Click Prediction
Factorization Machines
Complexity looks O(n^2) but is only O(kn):
Optimizer for Click PredictionWe’ve experimented with a variety of optimizers● Stochastic Dual-Coordinate Ascent (SDCA)● Stochastic Gradient Descent (SGD)● LBFGS
Optimizer for Click PredictionWe currently use Hogwild SGD ● Performs SGD in parallel over multiple cores● One shared set of weights● Each core reads training samples and updates
the shared set of weights● Ignore that weights may change between read
and write -- no locks, no coordination!
V. Beyond Click Prediction
Beyond Click Prediction
We can reuse this infrastructure to train additional predictors
Bidding Strategy For Conversions
Bid Value = (vCPA) * (pCTR) * (pPCC)
● vCPA = “Value of Conversion” (constant)● pCTR = probability of click● pPCC (post click conversion) = prob convert
given click ● Gets the most click-through-conversions for
your money
Post Click Conversion Prediction
Additional Challenges (vs. Click Pred)● Conversion definition varies by advertiser● Many fewer conversions to train on ● Conversions may occur up to 30 days after the
click● Subjectivity in attributing conversion to click
Post Click Conversion Prediction
Joining Clicks to Conversions● We use Trail DB which stores event history by cookie● Iterate through 30 days of Trail DBs● A click followed by a conversion is given a positive label● A click followed by another click or end-of-trail is given a
negative label
http://traildb.io/1
1
Post Click Conversion Prediction
imp cli imp cli conv imp cli imp
Label 0 Label 1 Label 0
Joining Clicks to Conversions
Cookie Trail:
Post Click Conversion Prediction
Joining Clicks to Conversions● Clicks toward the end are more likely labelled 0● We can fix bias with 30 day look-ahead window…● ...but this is undesirable (requires additional 30 day wait to
accumulate data)● Instead we use a Delayed Feedback Model
Chapelle, Olivier. "Modeling delayed feedback in display advertising." Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining 24 Aug. 2014: 1097-1105.
1
1
Post Click Conversion Prediction
Delayed Feedback Model● Label clicks naively (i.e. label 0 even when no time to
convert)● Train an auxiliary model to predict click-conversion
delay● Weight negative instances accordingly
Utility Bidding
Auction dynamics vary by Ad Exchange
Simplest is a Second Price Auction● Winner pays the price of the second highest bid● Incentive to bid your true value
Utility BiddingBut many exchanges aren’t
Win price appears to be function of bid price
Utility BiddingAuctions can also include● Hard Floors (Cannot win if bid below)● Soft Floors (Pay first price if bid below)● Multistage Auctions ● And many more!
Utility BiddingFor such auctions we use additional predictors ● Probability we win the auction● Probability auction is first price
Questions?