Raymond J. Carroll Texas A&M University and University of Technology Sydney

Raymond J. CarrollTexas A&M University and University of

Technology Sydneyhttp://stat.tamu.edu/~carroll

Bayesian Methods for Density and Regression Deconvolution

Co-Authors

Bani Mallick Abhra Sarkar

John StaudenmayerDebdeep Pati

Longtime Collaborators in Deconvolution

Peter Hall Aurore Delaigle

Len Stefanski

Overview

• My main application interest is in nutrition• Nutritional intake is necessarily

multivariate• Smart nutritionists have recognized that in

cancers, it is the patterns of nutrition that matter, not single causes such as saturated fat

• To affect public health practice, nutritionists have developed scores that characterize how well one eats

• Healthy Eating Index, Dash score, Mediterranean score, etc.

Overview

• One day of French fries/Chips will not kill you

• It is your long-term average pattern that is important

• In population public health science, long term averages cannot be measured

• The best you can get is some version of self-report, e.g., multiple 24 hour recalls

• This fact has been the driver behind much of measurement error modeling, especially including density deconvolution

Overview

• Analysis is complicated by the fact that on a given day, people will not consume certain foods, e.g., whole grains, legumes, etc.

• My long term goal has been to develop methods that take into account measurement error, the multivariate nature of nutrition, and excess zeros.

Why it Matters

• What % of kids U.S. have alarmingly bad diets?• Ignore measurement error, 28%• Account for it, 8%

• What are the relative rates of colon cancer for those with a HEI score of 70 versus those with 40?• Ignore measurement error, decrease

10%• Account for it, decrease 35%

Overview

• We have perfectly serviceable and practical methods that involve transformations, random effects, latent variables and measurement errors

• The methods are widely and internationally used in nutritional surveillance and nutritional epidemiology

• For the multivariate case, computation is “Bayesian”

• Eventually though, anything random is assumed to be Gaussian

• Can we not do better?

Background

• In the classical measurement error – deconvolution problem, there is a variable, X, that is not observable

• Instead, a proxy for it, W, is observed• In the density problem, the goal is to

estimate the density of X using only observations on W

• Also, in population science contexts, the distribution of X given covariates Z is also important (very small literature on this)

Background

• In the regression problem, there is a response Y

• One goal is to estimate E(Y | X)

• Another goal is to estimate the distribution of Y given X, because variances are not always nuisance parameters

Background

• In the classic problem, W = X + U, with U independent on X.

• Deconvoluting kernel methods that result in consistent estimation of the density of X were discovered in 1988 (Stefanski, Hall, Fan and )

• They are kernel density estimates with kernel function

deconK (x)

Background

• In the classic problem, W = X + U, with U independent of X.

• The deconvoluting kernel is a corrected score for a ordinary kernel density function, with the property that for a bandwidth h,

• Lots of results on rates of convergence, etc.

decon 0 0E K (W-x )/h |X =K (X-x )/h

Background

• There is an R package called decon

• However, a paper to appear by A. Delaigle discusses problems with the package’s bandwidth selectors

• Her web site has Matlab code for cases that the measurement error is independent of X, including bandwidth selection

Problem Considered Here

• Here is a general class of models. Here are W and X

• The W’s are independent given X

ij i ij i

ij i i

2ij i i u i

W =X +U (X )

E U (X ) | X 0

var U (X ) | X (X )

Background

• There is a substantial econometric literature on technical conditions for identification in many different contexts (S. Schennach, X. Chen, Y. Hu)

• The problem I have stated is known to be nonparametrically identified if there are 3 replicates (and certain technical completeness assumptions hold)


• Here is a general class of models, First, Y

• The classical heteroscedastic model where the variance is important

• Identified if there are 2 replicate W’s

i i i i

i i i

2i i i ε i

Y =g(X )+ε (X )E ε (X ) | X 0

var ε (X ) | X (X )

Background

• The econometric literature invariably uses sieves with orthogonal basis functions

• The theory follows X. Shen’s 1997 paper

Background

• In practice, as with non-penalized splines, 5-7 basis functions are used to represent all densities and functions

• Constraints (such as being positive and integrating to 1 for densities) are often ignored

• In the problem I eventually want to solve, the dimension of the two densities = 19 (latent stuff all around

• Maybe use multivariate Hermite series?


• There is no deconvoluting kernel method that does density or regression deconvolution in the context that the distribution of the measurement error depends on X


• It seems to me that there are two ways to handle this problem in general• Sieves be an econometrician• Bayesian with flexible models

• Our methodology is explicitly Bayesian, but borrows basis function ideas from the sieve approach

Model Formulation• We borrow from Hu and Schennach’s

example and also Staudenmayer, Ruppert and Buonaccorsi

• Here, U is assumed independent of X• Also, e is independent of X

1/2ij i u i ij

1/2i i ε i i

W = X + s (X )U

Y = g(X )+ s (X )ε

Model Formulation• Our model is

• Like previous authors, we model as B-splines with positive coefficients

• We model as B-spline• As frequentists, we could model the

densities of X, U, and e by sieves, and appeal to Hu and Schennach for theory

• We have not investigated this

ε i u is (X ) and s (X )

1/2ij i u i ij

1/2i i ε i i

W = X + s (X )U

Y = g(X )+ s (X )ε

i g(X )

Model Formulation• Our model is

• As Bayesians, we have modeled the densities of X, U, and e by DPMM

• We have found that mixtures of normals, with an unknown number of components, is much faster, just as effective, and very stable numerically

1/2ij i u i ij

1/2i i ε i i

W = X + s (X )U

Y = g(X )+ s (X )ε

Model Formulation• We found that by fixing the number of

components to a largish number works best

• The method concentrates on a lower number of components (Rousseau and Mengersen found this in a non-measurement error context)

• There are lots of issues involved: (a) starting values; (b) hyper-parameters; (c) MH candidates; (d) constraints (e.g., zero means), (e) data standardization, etc.

Model Formulation• Here is a simulation example of density

deconvolution and homoscedasticity with a mixture of normals for X and a Laplace for U

• The settings come from a paper not by us• There are 3 replicates, so the density of U

is also estimated by our method (we let DKDE know the truth)

• I ran our R code as is, with no fine tuning

Model Formulation

Model Formulation• Here is another example• Y = sodium intake as measured by a food

frequency questionnaire (known to be biased)

• W = same thing, but measured by a 24 hour recall (known to be almost unbiased)

• We have R code for this

Model Formulation

The dashed line is the Y=X line, indicating the bias of the FFQ

Multivariate Deconvolution• There are also multivariate problems of

density deconvolution• We have found 4 papers about this• 3 deconvoluting kernel papers, all assume

the density of the measurement errors is known

• 1 of those papers has a bandwidth selector• Bovy et al (2011, AoAS) model X as a

mixture of normals, and assume U is independent of X and Gaussian with known covariance matrix. They use an EM algorithm.

Multivariate Deconvolution• We have generalized our 1-dimension

deconvolution approach as

• Again, X is a mixture of multivariate normals, as is U

• However, standard multivariate inverse Wishart computations fail miserably

1/2ijk ij uj ij ijkW = X + s (X )U

Multivariate Deconvolution• We have generalized our 1-dimension

deconvolution approach as

• We use a factor analytic representation of the component specific covariance matrices with sparsity inducing shrinkage priors on the factor loading matrices (A. Bhattacharya and D. Dunson)

• This is crucial in flexibly lowering the dimension of the covariance matrices

1/2ijk ij uj ij ijkW = X + s (X )U

Multivariate Deconvolution

Multivariate inverse Wisharts on top, Latent factor model on bottom

Blue = MIW, green = MLFA.

Variables are (a) carbs; (b) fiber; (c) protein and (d) potassium

Conclusion• I still want to get to my problem of

multiple nutrients/foods, excess zeros and measurement error

• Dimension reduction and flexible models seem a practical way to go

• Final point: for health risk estimation and nutritional surveillance, only a 1-dimensional summary is needed, hence better rates of convergence

Documents

Raymond J. Carroll Texas A&M University and University of Technology Sydney