View
215
Download
0
Tags:
Embed Size (px)
Citation preview
1
Estimating Rates of Rare Events at Multiple Resolutions
Deepak AgarwalAndrei BroderDeepayan ChakrabartiDejan DiklicVanja JosifovskiMayssam Sayyadian
2
Estimation in the “tail”
Contextual Advertising Show an ad on a webpage (“impression”) Revenue is generated if a user clicks Problem: Estimate the click-through rate (CTR) of
an ad on a page Most (ad, page) pairs have very few impressions, if any, and even fewer clicks Severe data sparsity
3
Estimation in the “tail”
Use an existing, well-understood hierarchy Categorize ads and webpages to leaves of the
hierarchy CTR estimates of siblings are correlated The hierarchy allows us to aggregate data
Coarser resolutions provide reliable estimates for rare events which then influences estimation at finer
resolutions
4
System overview
Retrospective data[URL, ad, isClicked]
Crawl URLs
Classify pages and ads
Rare event estimation using
hierarchy
a sample of URLs
Impute impressions, fix sampling bias
5
Sampling of webpages
Naïve strategy: sample at random from the set of URLs Sampling errors in impression volume AND click
volume Instead, we propose:
Crawling all URLs with at least one click, and a sample of the remaining URLs Variability is only in impression volume
6
Imputation of impression volume
Ad classes
Pag
e cl
asse
s
sums to #impressions on ads of this ad class
[column constraint]
sums to ∑nij + K.∑mij
[row constraint]
sums toTotal impressions
(known)
#impressions = nij + mij + xij
Clicked pool
Sampled Non-clicked
pool
Excess impressions(to be imputed)
7
Imputation of impression volume Level 0
Level i
Page hierarchy Ad hierarchy
Region= (page node, ad node)
Region Hierarchy A cross-product of the page
hierarchy and the ad hierarchy
Page classes Ad classes
Region
9
Imputing xij
Level i
Level i+1
Iterative Proportional Fitting [Darroch+/1972]
• Initialize xij = nij + mij
• Iteratively scale xij values to match row/col/block constraint
• Ordering of constraints: top-down, then bottom-up, and repeat
blockPage classes Ad classes
10
Imputation: Summary
Given nij (impressions in clicked pool)
mij (impressions in sampled non-clicked pool) # impressions on ads of each ad class in the ad
hierarchy We get
Estimated impression volume Ñij = nij + mij + xij
in each region ij of every level
11
System overview
Retrospective data[page, ad, isclicked]
Crawl Pages
Classify pages and ads
Rare event estimation using
hierarchy
a sample of pages
Impute impressions, fix sampling bias
12
Rare rate modeling
1. Freeman-Tukey transform: yij = F-T(clicks and impressions at ij)
≈ transformed-CTR Variance stabilizing transformation: Var(y) is
independent of E[y] needed in further modeling
13
SijSparent(ij)
Rare rate modeling
2. Generative Model (Tree-structured Markov Model)
yij yparent(ij)
covariates βij variance Vij
Unobserved “state”
variance Wij
Vparent(ij)
βparent(ij)
Wparent(ij)
14
Rare rate modeling
Model fitting with a 2-pass Kalman filter: Filtering: Leaf to root Smoothing: Root to leaf
Linear in thenumber of regions
15
Experiments
503M impressions 7-level hierarchy of which the top 3 levels
were used Zero clicks in
76% regions in level 2 95% regions in level 3
Full dataset DFULL, and a 2/3 sample DSAMPLE
16
Experiments
Estimate CTRs for all regions R in level 3 with zero clicks in DSAMPLE
Some of these regions R>0 get clicks in DFULL
A good model should predict higher CTRs for R>0 as against the other regions in R
17
Experiments
We compared 4 models TS: our tree-structured model LM (level-mean): each level smoothed
independently NS (no smoothing): CTR proportional to 1/Ñ Random: Assuming |R>0| is given, randomly
predict the membership of R>0 out of R
19
Experiments
Enough impressions little “borrowing”
from siblings
Few impressions Estimates depend more on siblings
20
Related Work
Multi-resolution modeling studied in time series modeling and spatial
statistics [Openshaw+/79, Cressie/90, Chou+/94] Imputation
studied in statistics [Darroch+/1972]
Application of such models to estimation of such rare events (rates of ~10-3) is novel
21
Conclusions
We presented a method to estimate rates of extremely rare events at multiple resolutions under severe sparsity constraints
Our method has two parts Imputation incorporates hierarchy, fixes
sampling bias Tree-structured generative model extremely
fast parameter fitting
22
Rare rate modeling
1. Freeman-Tukey transform
Distinguishes between regions with zero clicks based on the number of impressions
Variance stabilizing transformation: Var(y) is independent of E[y] needed in further modeling
~ ~
# clicks in region r
# impressions in region r
23
Rare rate modeling
Generative Model Sij values can be quickly
estimated using a Kalman filtering algorithm
Kalman filter requires knowledge of β, V, and W EM wrapped around the
Kalman filter
filtering
smoo
thin
g
24
Rare rate modeling
Fitting using a Kalman filtering algorithm Filtering: Recursively aggregate
data from leaves to root Smoothing: Propagate
information from root to leaves
Complexity: linear in the number of regions, for both time and space
filtering
smoo
thin
g
25
Rare rate modeling
Fitting using a Kalman filtering algorithm Filtering: Recursively aggregate
data from leaves to root Smoothing: Propagates
information from root to leaves
Kalman filter requires knowledge of β, V, and W EM wrapped around the
Kalman filter
filtering
smoo
thin
g
26
Imputing xij
Z(i)
Z(i+1)
Iterative Proportional Fitting [Darroch+/1972]
Initialize xij = nij + mij
Top-down:
• Scale all xij in every block in Z(i+1) to sum to its parent in Z(i)
• Scale all xij in Z(i+1) to sum to the row totals
• Scale all xij in Z(i+1) to sum to the column totals
Repeat for every level Z(i)
Bottom-up: Similar
blockPage classes Ad classes