European Central Bank June 2018 Kay Brodersen and Art Owen ... · Modeling with Bayesian structural...

Preview:

Citation preview

1

Out of controlsHal Varian

Incorporating work by Steve Scott,Kay Brodersen and Art Owen

European Central BankJune 2018

Opinions expressed in these slides are solely those of the author.

● Randomized experiments are gold standard for causality

● But controls can be costly○ Opportunity cost○ Actual costs

● Control groups are only one way of estimating counterfactual

● This talk is about alternatives:○ Synthetic controls○ Regression discontinuity

Outline

Motivating problem

An advertiser contemplates changing its (bid, budget, creative) and wants to know what will happen to some performance measure (clicks, revenue, conversions). Solution: run an experiment.

Some possible designs

● Apply treatment for some users, compare to non-treated users● Apply treatment for some geos, compare to non-treated geos● Apply treatment for some advertisers compare to similar advertisers that

did not get treatment● Apply treatment and compare actual to prediction of would have

happened without the treatment (interrupted regression, synthetic control)

● Last method is nice since don’t have to explicitly manage controls● We want an “automatic” way to build predictive models for counterfactual

● Time series methods to model seasonality and trend● Regression methods incorporate contemporaneous predictors

Time series + regression component

Kalman filter for time series component● Handles trend and seasonality● Bayesian-friendly● Andrew Harvey [1989], Durbin and Koopman [2012]

Spike-and-slab for regression component● George-McCulloch [1997]); Madigan-Raftery [1994] ● Probability variable is included in regression (spike)● Probability distribution over coefficient value (slab)● Sample from simulated posterior, average to get point prediction● See Scott and Varian (2013, 2014) for details

Steve Scott’s “Bayesian Structural Time Series”● Download R package from CRAN (BoomSpikeSlab, bsts)

Steve Scott

Confidential + Proprietary

Modeling with Bayesian structural time series

Model Trend Seasonal Regression= + +

● Trend and Seasonal components are adaptive and nonparametric.

● Can add other model components as necessary (e.g. holidays,

weather).

● The regression component uses Google Trends as predictors.

○ Could use other predictors as well

● Accounting for the time series structure of the problem is

important.

Confidential + Proprietary

One slide tutorial on Kalman forecastingTwo important time series models:

Both have obvious predictors. Kalman models the in-between cases...

Best predictor for mt= E t for in-between model turns out to be:

Confidential + Proprietary

Basic structural model

Stochastic generalization of constant trend model:

We add a regression component:

But not just any old regression….

Confidential + Proprietary

Spike and slab regression for variable selection

George and McCulloch (1997)

Confidential + Proprietary

Estimation by MCMC sampler

1. Draw Kalman state variables (level, slope, etc) and regression parameters and from posterior

2. Draw variances of the state components3. Repeat a few thousand times4. Results

a. Posterior distn of variances in state componentsb. Posterior distn of inclusion for each predictor c. Posterior distn of coefficient values d. Posterior distn of forecast yt

5. Point estimates are averages over these distributions6. Model uncertainty is natural due to huge number of possible

models

Confidential + Proprietary

Estimating a model using bsts

Example: initial claims for unemployment benefits

Grey bars indicate recessions

Confidential + Proprietary

Does Google data help explain initial claims?

Cumulative absolute 1-step-ahead prediction errors

Plain time series model vs. model that includes Google Trends (correlate) data.

Models perform about the same when things are "boring".

Google Trends helps the model react to sudden changes (e.g. the recession).

Confidential + Proprietary

What parts of the model did the explaining?

● The output is a "dynamic distribution" that shows the uncertainty around each component’s contribution.

● The regression help prediction

Confidential + Proprietary

Which variables were important in the regression?

The important search terms:● unemployment office● filing for unemployment● Idaho unemployment

(Almost any state would do, but Idaho is slightly more predictive than the others in these data).

Other 95 search terms have inclusion probability less than 0.1.

The predictions from this model average over which variables are in/out.

Sirius internet radio is spurious.

New Home Sales in US

Raw correlation

Predictors chosen by model

model: yt = trendt + seasonalt + b1 x1t + b2 x2t

plot1: yt = trendt

plot2: yt = trendt + seasonalt

plot3: yt = trendt + seasonalt + b1 x1t

plot4: yt = trendt + seasonalt + b1 x1t + b2x2t

Incremental fit plots

Visualize how much each predictor contributes to model fit

Confidential & Proprietary

Trend

Confidential & Proprietary

Seasonal

Confidential & Proprietary

[appreciation rate]

Confidential & Proprietary

[irs 1031]

Confidential & Proprietary

[century 21 realtors]

Confidential & Proprietary

[real estate purchase]

Confidential & Proprietary

[80-20 mortgage]

Another approach: use query categories

Advantage: exact queries may not persist in future but categories will (probably).

1. all 148 commercial categories;

2. only the 8 real estate categories;

3. no regression predictors

Out of sample prediction

Estimate up until time t and then freeze all the posterior distributions. This freezes regression and the variance posteriors, but allows for Kalman updating and regression predictions.

Same thing for motor vehicle sales

Motor vehicles out of sample forecast

Predict recessions at different forecast horizonsBased on Berge, Sinha, Smolyansky (2016), “Which market indicators best forecast recessions?” FEDS Notes. Logistic model, BMA.

Predicting recessions 0 months out

Predicting recessions 0 months out

Predicting recessions 3 months out

Predicting recessions 3 months out

Predicting recessions 12 months out

Predicting recessions 12 months out

Why time series estimation is important: persistence

Not in recession In recession

Not in recession 0.84 0.01

In recession 0.01 0.14T

T+1

Predictors are essentially the same as without a time series model, but you assign higher probability to recessions.

Motivating example for causal inferenceAn advertiser contemplates changing its (bid, budget, creative) and wants to know what will happen to its (clicks, revenue, conversions). Apply treatment and compare actual to prediction of would have happened without the treatment (interrupted regression, synthetic control).

Brodersen, Gallusser, Koehler, Remy and Scott Annals of Applied Statistics, vol. 9 (2015), pp. 247-274

Brodersen and Varian (2016) Estimating online ad effectiveness: a practical guide

Causal inferenceCan build model of counterfactual: train, test, treat, compare. Related to “synthetic control” in Abadie (2003, 2010).

Cross section model: treated v untreated regions

Time series model: treated v untreated times (Trends)

Predicted clicks in untreated regions (Trends)

Another ad experiment: clicks and sales

Ad clicks Sales

Cannibalization of organic clicks

Many ways to predict counterfactual

(joint work with Kay Brodersen)

● BSTS○ Extrapolation○ Boxcar variable

■ All predictors■ Top predictors

● Linear model○ Simple linear model with extrapolation○ Simple linear model with boxcar variable

● Deseasonalize data or not○ Deseasonalized with extrapolation○ Deseasonalized with boxcar○ Use weekly data

BSTS extrapoliation

Train Treat & compare Test

Boxcar variable

Simple linear model

● Two predictors from Trends● July 4 dummy variable

Deseasonalized data

● Day of week dummies● July 4 dummy● Explain other spike

Weekly data

Comparison

Regression discontinuity

● Motivating examples○ Causal impact of incentive program○ Causal impact of merit scholarships

● Estimation methods○ Randomized controlled trial (RCT): good

statistical properties○ Regression discontinuity (RDD): no

disruption of ordering● HYBRID = RCT + RDD

○ Related literature: tie-breaker design: if two subjects have same score, use lottery to break ties

Art Owen

Local randomization

Set up the model

Benefit of randomization: tighter coefficient estimates

Cost of randomization: ranking is perturbed

● Alas, randomization disturbs the ranking● Tradeoff: tighter estimate vs noisier

ordering● Example of classic “explore v exploit”

tradeoff○ More experimentation now leads to

better optimization in future○ Randomize now, reap rewards come

later!

Objective function

Optimal ∆

Optimal ∆ plot

Summary

Recommended