Calibrated Bayes: an attractive framework for official ... Rod Little.pdf · “quasi-randomization”, where we pretend that we have one) •Model-based inference: survey variables

Calibrated Bayes: an attractive framework for official statistics in the

21st century

Roderick J. Little

Overview

• Design-based versus model-based survey inference

• Current orthodoxy: design-model compromise

– Strengths and drawbacks

• An alternative: Calibrated Bayes

• Two US Census Bureau applications

– Disclaimer: views are mine, not US Census Bureau

NTTS 2015: Calibrated Bayes 2

Overview








Survey estimation • Design-based inference: population values are

fixed, inference is based on probability distribution of sample selection. Obviously this assumes that we have a probability sample (or “quasi-randomization”, where we pretend that we have one)

• Model-based inference: survey variables are assumed to come from a statistical model: probability sampling is not the basis for inference, but useful for making the sample selection ignorable. (see e.g. Gelman et al., 2003; Little 2004)


Design vs model-based survey inference • Two main variants of model-based inference:

– Superpopulation models: Frequentist inference based on repeated samples from a “superpopulation” model

– Bayes: add prior distribution for parameters; inference about finite population quantities or parameters based on posterior distribution

• A fascinating part of the more general debate about frequentist versus Bayesian inference in statistics at large: – Design-based inference is inherently frequentist

– Purest form of model-based inference is Bayes


Design-based inference

1( ,..., ) = population values (fixed); design variablesNY Y Y Z

( , ) = finite population quantityQ Q Y Z

1( ,..., ) = Sample Inclusion Indicators (random)NI I I

Ii RST

1

0

,

,

unit included in sample

otherwise

incˆ ˆ( , , ) = sample estimate of q q Y I Z Q

incˆ ˆ( , , ) = sample estimate of , the variance of V Y I Z V q

inc part of included in the surveyY Y

ˆ ˆˆ ˆ1.96 , 1.96 95% confidence interval for q V q V Q NTTS 2015: Calibrated Bayes 6

Choice of q̂

NTTS 2015: Calibrated Bayes

It is natural to seek an estimate that is -

However, this kind of optimality is not possible without

a model (Horvitz and Thompson 1952, Godambe 1955)

design efficient

There are many choices of design-consistent estimates ...

Many survey estimates are motivated by

Regression model regression estimator

Ratio model rat

mod

io

els:

estimator, etc.

implicit

Seek good design-based properties:

ˆ : ( | ) (too strong)

ˆOr weaker: : as sample size gets large

design unbiasedness E q Y Q

design consistency q Q

7

Limitations of design-based approach

• Inference is based on probability sampling, but true probability samples are harder and harder to come by:

– Noncontact, nonresponse is increasing

– Face-to-face interviews increasingly expensive

– High proportion of available information is now not based on probability samples (e.g. internet, administrative data)

• Theory is basically asymptotic -- limited tools for small samples, e.g. small area estimation


Asymptotia Highlands

Murky sub-asymptotial forests

How many

more to reach the promised

land of

asymptotia?

Design-based methods live in the land of asymptotia 9

Model-based approaches • In model-based, or model-dependent, approaches,

models are the basis for the entire inference: estimator, standard error, interval estimation

• Two variants:

– Superpopulation modeling

– Bayesian (full probability) modeling

• Common theme is to “infer” or “predict” about non-sampled portion of the population, conditional on the sample and model

• Superpopulation is super, but Bayes is better … for small samples


Bayes inference for surveys

inc

Model: ( | ) = prior distribution for

Data: ampled values of ; = design variables

p Y Z Y

Y s Y Z

inc

Inference about ( , ) are based on

posterior predictive distribution ( ( , ) | , )

Q Q Y Z

p Q Y Z Y Z

inc

inc

In particular:

ˆOne estimate is posterior mean: ( | , )

Standard error is posterior sd: ( | , )

95% posterior probability interval plays role

of confidence interval (with a simpler interpretat

q E Q Y Z

Var Q Y Z

ion)


Inference about is then obtained from its posterior

distribution, computed via Bayes’ Theorem:

Parametric models

Usually prior distribution is specified via parametric models:

( | ) ( | , ) ( | )p Y Z p Y Z p Z d

( | , ) = parametric model, as in superpopulation approachp Y Z

( | ) = prior distribution for p Z

That is: Posterior = Prior x Likelihood

inc inc

inc

( | , ) ( | ) ( | , )

( | , ) Likelihood function

p Y Z p Z L Y Z

L Y Z


Example. Spline model on weights

Z Y Z Sample Population

HT

1

1/ ; selection prob

n

i i i

i

y yN

mod

1 1

2 2

A modeling alternative to the HT estimator is create

predictions from a more robust model relating to :

1ˆ ˆ= , predictions from model, e.g.:

~ Nor( , ); leads to

n N

i i i

i i n

i i i

Y Z

y y y yN

y

HT

2

~ Nor( ( ), ); ( ) = penalized spline of on

Simulations in Zheng and Little (2005) suggest better RMSE,

confidence coverage for spline model compared with

design-based approaches

k

i i i i

y

y S S Y Z


The model-based perspective- pros

• Flexible, unified approach for all survey problems

– Models for nonresponse, response and matching errors, small area models, combining data sources

• Bayesian approach is not asymptotic, provides better small-sample inferences

• Probability sampling is justified as making sampling mechanism ignorable, improving robustness


Models bring survey inference closer to

the statistical mainstream

B/F Gorilla

Follow my (frequentist)

statistical standards

Why? I am an

economist, I

build models!

15 NTTS 2015: Calibrated Bayes

The model-based perspective- cons

• Explicit dependence on the choice of model, which has subjective elements (but assumptions are explicit, not buried in a formula)

• Bad models provide bad answers – justifiable concerns about the effect of model misspecification

• Models are needed for all survey variables – need to understand the data, and potential for more complex computations


Overview








The current “status quo” -- design-

model compromise • Design-based for large samples, descriptive statistics

– But may be model assisted, e.g. regression calibration:

– model estimates adjusted to protect against misspecification, (e.g. Särndal, Swensson and Wretman 1992).

• Model-based for small area estimation, nonresponse, time series,…

• Attempts to capitalize on best features of both paradigms… but … at the expense of “inferential schizophrenia” (Little 2012)?


GREG

1 1

ˆ ˆ ˆ ˆ( ) / , model predictionN N

i i i i i i

i i

T y I y y y

Example: when is an area “small”?

n

-

o

m

e

t

e

r

Design-based inference

-----------------------------------

Model-based inference

n0 = “Point of

inferential

schizophrenia”

How do I choose n0?

If n0 = 35, should my entire statistical philosophy

and inference be different when n=34 and n=36? n=36, CI: [ ] (wider since based on direct estimate)

n=34, CI: [ ] (narrower since based on model)


Multilevel (hierarchical Bayes) models

n

-

o

m

e

t

e

r

Bayesian multilevel model estimates borrow

strength increasingly from model as n decreases

ˆ(1 )a a a a aw y w

aw

1

0

Sample size n

Model estimate

Direct estimate


Overview








An alternative paradigm: Calibrated Bayes • Frequentists should be Bayesian

– Bayes is optimal under a correctly specified model

• Bayesians should be frequentist

– We never know the model (and all models are wrong)

– Inferences should be robust to misspecification, have good repeated sampling characteristics

• Calibrated Bayes (Box 1980, Rubin 1984, Little 2006, 2012, 2013)

– Inference based on a Bayesian model

– Model chosen to yield inferences that are well-calibrated in a frequentist sense

– Aim for posterior probability intervals that have (approximately) nominal frequentist coverage



Bayes/frequentist compromises

“I believe that … sampling theory is needed for exploration and ultimate criticism of the entertained model in the light of the current data, while Bayes’ theory is needed for estimation of parameters conditional on adequacy of the model.”

George Box (1980)

Calibrated Bayes “The applied statistician should be

Bayesian in principle and calibrated to the real world in practice – appropriate frequency calculations help to define such a tie.”


“… frequency calculations are useful for making Bayesian statements scientific, … in the sense of capable of being shown wrong by empirical test; here the technique is the calibration of Bayesian probabilities to the frequencies of actual events.”

Rubin (1984)

NTTS 2015: Calibrated Bayes

Calibrated Bayes models for surveys should

incorporate sample design features

• The “Calibrated” part of Calibrated Bayes requires robust models with good repeated sampling properties:

• Generally weak priors that are dominated by the likelihood (“objective Bayes”)

• Models that incorporate sampling design features:

– Capture design weights and stratifying variables as covariates in the prediction model (e.g. Gelman 2007)

– Clustering via hierarchical random effects models

25

Overview








Applications

• Voting Rights Act special tabulation

• The American Community Survey (ACS) and the “standard error error”


Voting Rights Act Special

Tabulation

• Section 203 Language Provisions of the Voting Rights Act

• Determines counties and townships required to provide language assistance at the polls

• Determinations are based in part on the following “more than 5%” provision:

… More than 5 percent of voting age citizens of political district are members of a single language minority and are Limited English Proficient (LEP).

28 NTTS 2015: Calibrated Bayes

Voting Rights Act Tabulations • Previously used direct estimates from Long Form

Decennial Census Data • Used ACS 2005-2009 and 2010 Census data to

produce estimates by fall 2011 • Direct estimates for some districts are based on small

ACS sample and hence have unacceptably high variance

• E.g. let P be proportion of voting age citizens in political district who are members of a single language minority and are Limited English Proficient

• Suppose ACS was a simple random sample, a direct estimate of P is the sample proportion m/n – District A with n=105, m=5, m/n < 0.05 – District B with n=105, m=6, m/n > 0.05 – Direct ACS estimation is more complex, but same idea applies


Voting Rights Tabulations • Overview of approach to the “more than 5%” provision:

• Build a district level regression model to predict P based on variables in the ACS

• Classify districts into classes with similar predicted P based on the model [predictive mean stratification]

• Within classes, apply a Beta-Binomial model that pulls the direct ACS estimate of P towards the average P for districts in that class

• Compare Beta-Binomial model estimate with 5% for this aspect of the determination

• Rationale: increased precision of Beta-Binomial estimates in small samples increases the probability of getting the determination right, particularly in small districts • See Joyce et al. (2014)


• Small p and n, posterior distribution is skewed to right

mode median mean

• What’s the right point estimate: median, mode, mean? Bayes forces a choice …

• Design-based, superpopulation model approaches fail to address the issue

– Maximum likelihood is equivalent to mode with flat prior, which does not correspond to a sensible loss function

Bayes forces a loss function


American Community Survey • US Census Bureau is making available thousands

of ACS tables, with millions of cells

• A high fraction of these estimates are based on very little data, and hence are very noisy

– Many people want information, not data, so ACS should produce information products, as well as data products

– When noise swamps the signal, the information content is buried

– Data products are highly constrained by confidentiality requirements, leading to incompleteness


The Statistical Problem • The ACS philosophy is essentially to produce

“direct” (“design-based”) estimates, together with margins of error

• This works fine with large samples, but most of the ACS estimates are based on small samples

– The estimates are often too noisy to be useful

– The confidence intervals derived from the estimates and margins of error are known to be of poor quality, violating statistical standards

• Intervals include proportions outside the range (0,1)

• Intervals do not have nominal coverage


The “standard error” error

• ACS reports estimates and margins of error that yield asymptotic 90% confidence intervals

• But in small samples, the implied confidence intervals do not have the stated coverage; so

• Seek to replaces estimates and margins of error by posterior means and 5% to 95% credibility intervals that have the approximately the nominal coverage

• A non-Bayesian can interpret the posterior means as estimates, and the 90% credibility intervals as 90% confidence intervals.


35

Binary outcome: Schmertmann example

Margins of

error exceed

the estimates

Data for example


outcome (e.g. poverty)

covariates (e.g. categorized age=a, gender = g, stratum = h)

In county :

sample count with age=a, gender = g, stratum = h

sample count in poverty with age=a, ge

aghc

aghc

Y

x

c

n

x

nder = g, stratum = h

ˆ / sample proportionaghc aghc aghcp x n

Fully Bayesian model


*

| ~ Bin( , )

~ Beta( , ) Beta ( , )

[Assumption: ]

| ~ Beta , (1 )

aghc aghc aghc aghc

aghc agh agh agh

agh agh agh

aghc aghc aghc agh aghc aghc agh

x p p n

p

p x x n x

Key is how to determine prior parameters , (or , )

(a) Empirical Bayes: estimate prior parameters, then treat as if known

Simple beta intervals, but understates uncertainty

agh agh agh

(b) Full Bayes: Incorporate uncertainty of prior parameter estimates

More work, but better reflects uncertainty; Consider approximations,

since full Bayes seems computationally complex

Pragmatic “pseudo-Bayes” approach

Tom Louis suggested this simple “Bayes-like” approach:

A. Compute design-based estimate of proportion and standard error using existing methods

B. Pretend data are binomial with number of successes x* and sample size n* that lead to the estimates in A.

C. Compute Beta posterior distribution with noninformative prior (e.g. uniform or Jeffreys)

D. Compute 90% posterior credibility interval based on this Beta posterior (reflects asymmetry, always between 0 and 1)

Simple to implement and easily beats standard Wald-type confidence intervals in simulations (Franco, Little, Louis and Slud 2015, in preparation)


Barriers to Calibrated Bayes • It’s a major paradigm shift

• It’s too much work/computation

– but this concern is alleviated by gains in computing power and advances in Bayesian computational methods

• More explicit dependence on the choice of model -- concerns with model misspecification

– “Design-based is model-free and hence robust…model-based requires models, which are inherently subjective”

• But models are essential for today’s data, and

• a judicious Calibrated Bayes model is robust and incorporates key design features – and would bring official statistics back in the statistical mainstream


References 1 Box, G.E.P. (1980), Sampling and Bayes inference in scientific modeling and robustness (with discussion), JRSSA, 143, 383-430.

Joyce, P.M., Malec, D., Little, R.J., Gilary, A., Navarro, A. and Asiala, M.E. (2014). Statistical Modeling Methodology for the Voting Rights Act Section 203 Language Assistance Determinations. JASA, 109, 36-47.

Gelman, A. (2007). Struggles with survey weighting and regression modeling. Statist. Sci., 22, 2, 153-164 (with discussion and rejoinder).

Gelman, A., Carlin, J.B., Stern, H.S. and Rubin, D.B. (2003), Bayesian Data Analysis, 2nd. edition. New York: CRC Press.

Godambe, V.P. (1955). A unified theory of sampling from finite populations. JRSSB, 17, 269-278.

Horvitz, D.G. & Thompson, D.J. (1952). A generalization of sampling without replacement from a finite universe. JASA, 47, 663-685.

Little, R.J.A. (2004). To Model or Not to Model? Competing Modes of Inference for Finite Population Sampling. JASA, 99, 546-556. NTTS 2015: Calibrated Bayes 40

References 2

Little, R.J.A. (2006). Calibrated Bayes: A Bayes/frequentist roadmap. Am. Statist., 60, 3, 213-223

_____ (2012). Calibrated Bayes: an alternative inferential paradigm for official statistics (with discussion and rejoinder). JOS, 28, 3, 309-372.

_____ (2013). Survey Sampling: Past Controversies, Current Orthodoxies, and Future Paradigms. In Past, Present and Future of Statistical Science, COPSS 50th Anniversary Volume, X. Lin, D. L. Banks, C. Genest, G. Molenberghs, D.W. Scott, and J.-L. Wang, eds. CRC Press.

Rubin, DB (1984), Bayesianly justifiable and relevant frequency calculations for the applied statistician, Annals Statist. 12, 1151-1172.

Särndal, C.-E., Swensson, B. & Wretman, J.H. (1992), Model Assisted Survey Sampling, Springer Verlag: New York.

Zheng, H. & Little, R.J. (2005). Inference for the population total from probability-proportional-to-size samples based on predictions from a penalized spline nonparametric model. JOS, 21, 1-20.


Documents

Calibrated Bayes: an attractive framework for official ... Rod Little.pdf · “quasi-randomization”, where we pretend that we have one) •Model-based inference: survey variables