Hierarchical Bayeswebuser.bus.umich.edu/plenk/Bayes Overview 1.pdf · Baseball Example • 90 MLB...

Hierarchical Bayes

Peter LenkStephen M Ross School of Business at the

University of MichiganSeptember 2004

Outline

• Bayesian Decision Theory• Simple Bayes and Shrinkage Estimates• Hierarchical Bayes• Numerical Methods• Batting Averages• HB Interaction Model

Bayesian Decision Model

Actions States ConsequencesDecisionModel

PosteriorUpdating

Result Optimal Decision: Marketing Action & Inference

Integration

Prior Likelihood

LossFunction

InferenceModelSeparable Parts:Don’t mix them Bayes

Theorem

Bayes Theorem

• Model for the data given parameters– f(y |θ) were θ = unknown parameters– E.g. Yi = µ + εi and θ = (µ,σ)– Likelihood l(θ) = f(y1 |θ) f(y2 |θ)… f(yn |θ)

• Prior distribution of parameters p(θ)• Update prior

– p(θ|Data ) = l(θ)p(θ)/f(y)– f(y) = marginal distribution of data

Easy Example:

• Estimate mean from a normal distribution.• Yi = µ + εi

• Error terms {εi} are iid normal– Mean is zero– Standard deviation of error terms is σ.– Assume that σ is known

Conjugate Prior for Mean

• Prior distribution for µ is normal– Prior mean is m0

– Prior variance is v02

– Precision is 1/ v02

Posterior Distribution

10 and 1

• Observe n data points• Posterior distribution

is normal– Mean is mn

– Variance is vn2

Shrinkage Estimators

• Bayes estimators combines prior guesses with sample estimates

• If the prior precision is larger than sample precision (prior has more information), then put more weight on prior mean.

• If the sample precision is larger the prior precision (sample has more information), then put more weight on sample average

Example

• Y is normal with mean 10 and Variance 16• Normal prior for the population mean

– Mean = 5 & Variance = 2– Prior is informative and way off

• Data– n = 5, Average = 10.9, Variance = 14.7

• Posterior is normal– Mean = 7.4 and variance is 1.2

Prior & Posterior n=5

00.10.20.30.40.5

0 5 10 15

Prior Posterior Likelihood

Prior MeanSample Mean

Posterior Mean

00.20.40.60.8

0 5 10 15

Use Less Informative Prior

• Y is normal with mean 10 and Variance 16• Normal prior for the population mean

– Mean = 5 & Variance = 10 instead of 2– Prior is “flatter”

• Data– n = 5, Average = 10.9, Variance = 14.7

• Posterior is normal– Mean = 9.6 and variance is 2.3

00.10.20.30.40.5

0 5 10 15

00.20.40.60.8

0 5 10 15

Summary

• Prior has less effect as sample size increases

• Very informative priors give good results with smaller samples if prior information is correct

• If you really don’t know, then use “flatter” or less informative priors

What about Marketing?

• HB revolution in how we think about customers

Henry FordAll Customers are the Same

Alfred SloanSeveral Common Preferences

Continuous Heterogeneity

Profit Maximization

It Can Get Wild!

HB Model for Weekly Spending

• Within-subject model:Yi,j = µi + εi,j and var(εi,j) = σi

• Heterogeneity in mean weekly spending or between-subjects

µi = θ + δi and var(δi) = τ2

• Prior Distributionθ is N(u0,v0

2)• Variances are known

Variances & Covariances

• Var(Yi,j| µi) = σi2 (known µi)

• Var(Yi,j|θ) = τ2 + σi2 (unknown µi)

• Cov(Yi,j, Yi,k) = τ2 for j not equal to k• Observations from different subjects are

independent

Precisions = 1/Variance

( ) ( )

( ) ( )n

1|Pr and|Pr

1|Pr and1|Pr

precisionprior is 1Pr

στθ

τθµ

Joint Distribution

( ) ( ) ( )∏∏==

ii yfgvuh

2200 ,|,|,|

σµτθµθ

Prior Between Subjects Within Subjects

Bayes Theorem

( ) ( )( )

( ) ( )θµθµ

θµθµ

θµθµθµθµ

,,,,|,

YPYPYP

ddYPYPYP

=∫∫

Constant because Y is fixed & known

Bayes Estimator

• Posterior means are optimal under squared error loss

E(µi|Y) and E(θ|Y)

• Measure of accuracy is posterior variancevar(µi|Y) and var(θ|Y)

Posterior Distribution of θ

• Normal distribution• Posterior mean is uN

• Posterior variance is vN2

• Posterior precision is Pr(θ|Y) = 1/vN2

Posterior Precision of θ “Pr” = Precision = 1/Variance

( ) ( )

1|Pr and 1Pr

|Pr)Pr(|Pr

στθθ

θθθ

+= ∑=

Posterior Mean of θ

( )( )( )YYw

|Pr|Pr and

|Pr)Pr(

+= ∑=

Updating of θ

• Prior Mean Posterior Meanu0 uN

• Prior Var Posterior Varv0

Posterior Mean of µi

[ ] ( )

( )( ) ( )

1|Pr|Pr|Pr

σµθµ

ααµ

Between-Subject Heterogeneity in Mean Household Spending

0 10 20 30 40 50 60

Spending

Heterogeneity

Between & Within Subjects Distributions

0 10 20 30 40 50 60

Spending

nHeterogeneity Subject 1 Subject 2 Subject 3

2 Observations per Subject

0 10 20 30 40 50 60

Spending

Heterogeneity Subject 1 Subject 2 Subject 3

Subject Averages

0 10 20 30 40 50 60

Spending

Pooled Estimate of Mean

0 10 20 30 40 50 60

Spending

nHeterogeneity Subject 1 Subject 2 Subject 3

Pooled estimate of population average spending

Sample Estimates

• Disaggregate estimate of µi only uses the observations for subject i.– Super if 30 or more observations per subject

• Pooled or aggregate estimator of θ– Smaller sampling error– Ignores individual difference

HB Shrinkage Estimator• Take combination of individual average and pooled

average

• What are the correct weights?• HB automatically gives optimal weights based on

– Prior variance of µi

– Number of observations for subject i – Variance of past spending for subject i – Number of subjects– Amount of heterogeneity in household means

( )YwYw iii −+ 1

Shrinkage Estimates

0 10 20 30 40 50 60

Spending

20 Observations per Subject

0 10 20 30 40 50 60

Spending

Bayes & Shrinkage Estimates

• Bayes estimators automatically determine the optimal amount of shrinkage to minimize MSE for true parameters and predictions

• Borrows strength from all subjects• Tradeoff some bias for variance reduction

Good & Bad News

• Only simple models result in equations• Models we use in marketing require

numerical methods to compute posterior mean, posterior standard deviations, predictions and so on.

Numerical Integration

• Compute posterior mean of function T(θ).

( )[ ] ( ) ( ) θθθθ dypTyTE ∫= ||

T(x) and f(x)

0 0.25 0.5 0.75 1

T(x)f(x)

0123456789

0 0.2 0.4 0.6 0.8 1

Trapezoid RuleT(x)f(x)

0123456789

0 0.2 0.4 0.6 0.8 1

Grid Methods

• Very accurate with few functional evaluations

• Need to know where the action is• Does not scale well to higher dimensions• You need to be very smart to make it work

Monte Carlo

• Generate random draws θ1, θ2, …, θm from posterior distribution using a random number generator.

• What happened to the density of θ?

( )[ ] ( )∑=

1| θθ

Good & Bad News

• If your computer has a random number generator for the posterior distribution, Monte Carlo is a snap to do.

• Your computer almost never has the correct random number generator.

Importance Sampling

• Would like to sample from density f• You have a good random number

generator for the density g• Importance sampling lets you generate

random deviates from g to evaluate expectations with respect to f.

• Generate φ1, …, φm from g

Importance Sampling Approximation

( ) ( ) ( ) ( )( ) ( )

( ) ( )

( )( ) ( )

( ) ( )( ) ( ) ( )

( )∑

∑∑

∫∫

dggfTdfT

ϕϕϕϕϕ

ϕϕϕ

ϕϕϕϕϕθθθ

Markov Chain Monte Carlo

• Extension of Monte Carlo• Random draws are not independent• Joint distribution f(β,φ) does not have a

convenient random number generator.• Conditional distributions g(φ|β) and h(β|φ)

are easy to generate from.

Iterative Generation from Full Conditionals

• Start at φ0

• Generate β1 from h(β|φ0) .• Generate φ1 from g(φ|β1).• …• Generate βm+1 from h(β|φm)• Generate φm+1 from g(φ|βm+1)

Baseball Example

• 90 MLB Players in 2000 season.• Observe at bats (AB) and hits (BA) in May• Infer distribution of batting averages

across players.• Predict batting averages in October using

data from May.

Baseball Batting Averages

263.26227.148Pena

542.26049.204Vizquel

546.31745.400Belle

436.32343.442Murray

ABBAABBA

OctoberMay

The Cleveland Indians - 1995

Estimating a Probability

• n at bats in May• X = number of hits • p = batting average for season• X has a binomial distribution

– mean np– variance np(1-p)

Binomial Distribution

0 5 10 15 20 25 30 35 40 45 50

Sample Successes

mean = 15.00 STD = 3.24n = 50 p = 0.3

Need Prior for Batting Average p

• 0 < p < 1• Beta distribution is popular choice• It has two parameters: α and β• Density is proportional to pα-1(1-p)β-1

• Prior Mean = α/(α+β)

Beta Prior for p

( )( ) ( ) ( ) 101

),|(,|

11 ≤≤−ΓΓ+Γ

−− pforpp

pBetapf

βαβα

( ) ∫∞

−−=Γ0

1 dxex xαα

Mean and Variance

)(1)()(

βαα

pEpEpV

Beta Distribution

0.00 0.25 0.50 0.75 1.00

alpha = 1 beta = 1 mean = 0.50

Beta Distribution

0.00 0.25 0.50 0.75 1.00

Beta Distribution

01122334

0.00 0.25 0.50 0.75 1.00

Beta Distribution

0.00 0.25 0.50 0.75 1.00

alpha = 0.5 beta = 2 mean = 0.20

Beta Distribution

0.00 0.25 0.50 0.75 1.00

alpha = 0.5 beta = 0.5 mean = 0.50

Bayes Theorem: Update prior for p after observing n and x

],|[]|Pr[

],|[]|Pr[],,|[

xnxpBeta

dqqfqx

pfpxxpf

−++=

−∝

−−+−+

βαβα

Inference About P:Posterior is also Beta

PriorParameters

PosteriorParameters

α α+x

β β+n-x

Posterior Mean of p:Its another shrinkage estimator

βα and ˆ

MeanPrior )1(ˆ

p=Beta & x=Binomial

0.00 0.25 0.50 0.75 1.00

alpha = 5 beta = 15

prior mean = 0.25

N=50 & X = 18

PosteriorPrior

Hierarchical Bayes Model

• Variation within batter i:– Xi given pi has a binomial distribution

• Variation among batters:– pi is a beta distribution with parameters

α and β.• Prior distribution for α and β

– Gamma (chi-square) distribution

Gamma Distribution

sYEYVand

yforeyrs

sryGyg

1)()()(

),|()(

−−

Gamma Distribution

0 10 20 30 40 50

ityalpha = 1 beta = 0.1

mean = 10.00

STD = 10.00

Gamma Distribution

0 10 20 30 40 50

ityalpha = 10 beta = 0.5

mean = 20.00

STD = 6.32

Specify Prior Parameters:r, s, u & v

• Priors: α is G(r,s) & β is G(u,v).• E(α) = r/s and V(α) = E(α)/s.• s determines variance relative to mean.• I used s = 0.25 or the variance is four

times larger than the mean.• Same for v.

),|()(

rpEEpE

= βα [ ] ( )1

1),|( 00

==urpp

pEVc βα

( ) ( )

−−=

andcpp

Specify Prior Parameters

• Guess a mean of all batting averages: p0 = 0.25

• Measure of my uncertainty of that guess:c = 0.01

• Parameter r = 4.4• Parameter u = 13.3

MCMC for Batting Averages

• Need full conditionals for pi give α and β– Beta distribution

• Need full conditionals for α and β given pi.

– Unknown distribution– Use Metropolis algorithm

MCMC: Full Conditionals for Player i Batting Average pi

[ ] ( ) ( )

( )iiii

xnxpBeta

pfpxxpf

−++=

−∝

−−+−+

βαβα

,||Pr,,|

MCMC: Full Conditional for α and β

( ) ( ) ( )∏=

−− −∝n

vugsrgpp

Metropolis Algorithm

• Want to generate θ from f• Instead, generate candidate value φ from

g(.|θ)– Density g can depend on θ– eg Random walk: φ = θ + δ

• With probability α(θ,φ) accept φ as the new value of θ

• With probability 1−α(θ,φ) keep θ

Transition Probability

( ) ( ) ( )( ) ( )

= 1,||max,θϕθϕθϕϕθα

• f is the full conditional density of θ• Ratios: do not need to know constants• Usually compute α on log scale.• Works if densities are not zero• Works better if g is close to f

Alpha and Beta vs Iteration

0 500 1000 1500 2000

MCMC Iteration

Posterior of Alpha

0102030405060708090

15 17 18 20 22 24 25 27 29 31 32 34 36 38 40 41

Posterior of Beta

0102030405060708090

38 44 51 58 64 71 78 85 91 98 105

Parameters Estimates

(11.7)(14.6)(std)

68.253.2β

(4.6)(8.4)(std)

26.217.8α

PosteriorPrior

Distribution of Batting Averages

0 0.2 0.4 0.6 0.8 1

Batting Average

PriorPosterior after May

Prediction of Season Averages

9.4%0.032Bayes

17.0%0.060MLE

MAPERMSE

Batting AveragesBayes Shrinks MLE

0 0.1 0.2 0.3 0.4 0.5

May 2000

MLEBayes

HB ConjointLenk, DeSarbo, Green, Young (1996)

• Evaluated computer profiles on a 0 to 10 scale for “likelihood to purchase”– 0 = Would not buy– 10 = Would definitely buy

• Design– 178 subjects– 13 attributes with two levels each– 20 profiles per subject

Attributes: Effect Coding

1. Hotline support2. RAM3. Screen Size4. CPU5. Hard Disk6. Multimedia7. Cache

8. Color9. Retail Store10.Warrantee11.Software12.Guarantee13.Price

Subject-Covariates

• Female: 1 if female and 0 if male• Years: # years of work experience• Own: 1 if has computer & 0 else• Nerd: 1 if technical background & 0 else• Apply: # of software applications• Expert: self-report expertise rating

Summary Statistics for Covariates

Variable Mean Std Dev MIN MAXFEMALE 0.275 0.448 0 1YEARS 4.416 2.369 1 18OWN 0.876 0.330 0 1NERD 0.275 0.448 0 1APPLY 4.287 1.574 1 9EXPERT 7.618 1.902 3 10

Interaction Model

• Within-Subjects

• Between-Subjects Heterogeneity

[ ] ( )mimi

niXY2,0|

,,1for

σεε

[ ] ( )∆=+Θ′=

,0|ipi

δδδβ

Average over Posterior Means and Std Dev of Partworths Across

Subjects

PostMean PostSTD PostMean PostSTDCNST 4.757 1.404 Cache 0.031 0.461HotLine 0.095 0.487 Color 0.026 0.371RAM 0.347 0.467 Dstrbtn 0.078 0.378ScrnSz 0.193 0.405 Wrrnty 0.124 0.392CPU 0.392 0.646 Sftwr 0.196 0.399HrdDsk 0.171 0.501 Grntee 0.112 0.427MultMd 0.494 0.574 Price -1.127 0.873

Impact of Covariates on Partworths

CNST RAM CPU Dstrbtn Wrrnty Grntee PriceCNST 3.74 0.52 -0.15 0.05 -0.01 0.03 -1.55FEMALE -0.10 0.06 0.12 0.40YEARS -0.11OWN -0.10 0.17 0.17 0.20 -0.12 -0.20NERD -0.27 0.15 0.16 -0.14APPLY 0.10EXPERT 0.17

Summary

• HB allows individual-level coefficients• Two level model

– With-in subjects– Between subjects (heterogeneity)

• HB shrinks unstable, subject-level estimators to population mean

Summary

• BDT provides integrated framework for making decisions and inference

• Good models consider all sources of uncertainty

• Good methods keep track of all sources of uncertainty

• Bayes does both

Hierarchical Bayeswebuser.bus.umich.edu/plenk/Bayes Overview 1.pdf · Baseball Example • 90 MLB...

Documents

Major League Baseball Club Credentialing System - MLB… · Functional Specifications User Guide – Affiliation Manager Major League Baseball Club Credentialing System Club Accreditation

MLB ( Sano continues 2017 dominance vs. Royals. MLB ...mlb.mlb.com/documents/0/2/2/240031022/Clips_7_2... · Kintzler entered Saturday tied for second in the AL — and third in baseball

Baseball Players - Top 10 MLB 3B - Baseball Gallery - recruithsathletes.com

ALL-TIME ROSTER - Minor League Baseball · 126 MYRTLE EAH ELIAS 201 MEIA IE 127 H History HISTORY ALL-TIME ROSTER MLB Rehab assignment, not on the ocial active roster MLB Rehab

MLB INTERNATIONAL NEWSLETTER WORLD BASEBALL …mlb.mlb.com/mlb/downloads/international/wbc_newsletter.pdf · With 39 games over 18 days in seven cities and ... MLB INTERNATIONAL NEWSLETTER

Major League Baseball. America’s Past Time MLB Goes Mobile iPad Disrupts MLB Distribution MLB and iPad Market iPad Enhances User Experience BAM Pricing

Introduction - Queen's Universityweb.business.queensu.ca/faculty/ylevin/papers/passes.pdf · Flex pack for National Basketball Association (NBA), Major League Baseball (MLB). These

cases.justia.com · Madison Square Garden Company. Moving defendants in Garber are the MLB, Major League Baseball Enterprises, Inc., MLB Advanced Media, L.P., and MLB advanced Media,

Program Materialsmlb.mlb.com/mlb/downloads/mlb_bgca_star_brochure.pdf · Major League Baseball® and Boys & Girls Clubs of America (BGCA) are proud to present the Major League Baseball®

Baseball Around the World - MLB.commlb.mlb.com/mlb/downloads/international/fall_newsletter... · 2020. 4. 22. · Baseball Around the World 8 2009 WORLD BASEBALL CLASSIC™ SCHEDULE

2011 Official Baseball Rules text - MLB.commlb.mlb.com/mlb/downloads/y2011/Official_Baseball_Rules.pdf · No part of the Official Baseball Rules may be reproduced or transmitted in

Weekly Notes 060618 - Major League Baseball · WEEKLY NOTES MAJOR LEAGUE BASEBALL THURSDAY, JUNE 7, 2018 2018 MLB DRAFT SUMMARY On Wednesday, Major League Baseball completed the 2018

Baseball Photo Gallery - Top 10 2015 MLB Centerfielders - recruithsathletes.com

2000 ANAHEIM ANGELS STATS AS OF - Baseball Sim … · MYETTE, Aaron - / / / PARQUE, Jim - / / / ... Willie - / / / BORKOWSKI, Dave - / / / ... RONBO-2000-MLB-Stats-Sheets.pdf

the 2018 mlB u media g - stevetheump.com · 1 The 2018 Major League Baseball Umpire Guide was published by the MLB Communications Department. Chief CommuniCations offiCer: …

Home run - €¦ · League Baseball (MLB) fans worldwide rely extensively on the Internet for play-by-play action, scores, original content, MLB trade rumors and analysis, fantasy

The years following the 1994-1995 ... - Major League Baseball › mlb › downloads › blue_ribbon.pdfThe Commissioner's Blue Ribbon Panel on Baseball Economics, representing the

2012 Edition OFFICIAL BASEBALL RULES - MLB.commlb.mlb.com/mlb/downloads/y2012/Official_Baseball_Rules.pdf · OFFICIAL BASEBALL RULES 31424 MLB Baseball Rules.qxd 2/10/2012 3:40 PM

WELCOME [mlb.mlb.com]mlb.mlb.com/mlb/downloads/y2015/phr_handbook.pdfWELCOME Major League Baseball® would like to thank you for participating in the exciting Major League Baseball®

CBS Fantasy Baseball Projected Points _ MLB Positional Cheat Sheet