33
Raymond J. Carroll Texas A&M University and University of Technology Sydney http://stat.tamu.edu/~carroll Bayesian Methods for Density and Regression Deconvolution

Raymond J. Carroll Texas A&M University and University of Technology Sydney

  • Upload
    evelyn

  • View
    39

  • Download
    0

Embed Size (px)

DESCRIPTION

Bayesian Methods for Density and Regression Deconvolution. Raymond J. Carroll Texas A&M University and University of Technology Sydney http://stat.tamu.edu/~carroll. Co-Authors.  Bani Mallick Abhra Sarkar .  John Staudenmayer Debdeep Pati . - PowerPoint PPT Presentation

Citation preview

Page 1: Raymond J. Carroll Texas A&M University and University of Technology Sydney

Raymond J. CarrollTexas A&M University and University of

Technology Sydneyhttp://stat.tamu.edu/~carroll

Bayesian Methods for Density and Regression Deconvolution

Page 2: Raymond J. Carroll Texas A&M University and University of Technology Sydney

Co-Authors

Bani Mallick Abhra Sarkar

John StaudenmayerDebdeep Pati

Page 3: Raymond J. Carroll Texas A&M University and University of Technology Sydney

Longtime Collaborators in Deconvolution

Peter Hall Aurore Delaigle

Len Stefanski

Page 4: Raymond J. Carroll Texas A&M University and University of Technology Sydney

Overview

• My main application interest is in nutrition• Nutritional intake is necessarily

multivariate• Smart nutritionists have recognized that in

cancers, it is the patterns of nutrition that matter, not single causes such as saturated fat

• To affect public health practice, nutritionists have developed scores that characterize how well one eats

• Healthy Eating Index, Dash score, Mediterranean score, etc.

Page 5: Raymond J. Carroll Texas A&M University and University of Technology Sydney

Overview

• One day of French fries/Chips will not kill you

• It is your long-term average pattern that is important

• In population public health science, long term averages cannot be measured

• The best you can get is some version of self-report, e.g., multiple 24 hour recalls

• This fact has been the driver behind much of measurement error modeling, especially including density deconvolution

Page 6: Raymond J. Carroll Texas A&M University and University of Technology Sydney

Overview

• Analysis is complicated by the fact that on a given day, people will not consume certain foods, e.g., whole grains, legumes, etc.

• My long term goal has been to develop methods that take into account measurement error, the multivariate nature of nutrition, and excess zeros.

Page 7: Raymond J. Carroll Texas A&M University and University of Technology Sydney

Why it Matters

• What % of kids U.S. have alarmingly bad diets?• Ignore measurement error, 28%• Account for it, 8%

• What are the relative rates of colon cancer for those with a HEI score of 70 versus those with 40?• Ignore measurement error, decrease

10%• Account for it, decrease 35%

Page 8: Raymond J. Carroll Texas A&M University and University of Technology Sydney

Overview

• We have perfectly serviceable and practical methods that involve transformations, random effects, latent variables and measurement errors

• The methods are widely and internationally used in nutritional surveillance and nutritional epidemiology

• For the multivariate case, computation is “Bayesian”

• Eventually though, anything random is assumed to be Gaussian

• Can we not do better?

Page 9: Raymond J. Carroll Texas A&M University and University of Technology Sydney

Background

• In the classical measurement error – deconvolution problem, there is a variable, X, that is not observable

• Instead, a proxy for it, W, is observed• In the density problem, the goal is to

estimate the density of X using only observations on W

• Also, in population science contexts, the distribution of X given covariates Z is also important (very small literature on this)

Page 10: Raymond J. Carroll Texas A&M University and University of Technology Sydney

Background

• In the regression problem, there is a response Y

• One goal is to estimate E(Y | X)

• Another goal is to estimate the distribution of Y given X, because variances are not always nuisance parameters

Page 11: Raymond J. Carroll Texas A&M University and University of Technology Sydney

Background

• In the classic problem, W = X + U, with U independent on X.

• Deconvoluting kernel methods that result in consistent estimation of the density of X were discovered in 1988 (Stefanski, Hall, Fan and )

• They are kernel density estimates with kernel function

deconK (x)

Page 12: Raymond J. Carroll Texas A&M University and University of Technology Sydney

Background

• In the classic problem, W = X + U, with U independent of X.

• The deconvoluting kernel is a corrected score for a ordinary kernel density function, with the property that for a bandwidth h,

• Lots of results on rates of convergence, etc.

decon 0 0E K (W-x )/h |X =K (X-x )/h

Page 13: Raymond J. Carroll Texas A&M University and University of Technology Sydney

Background

• There is an R package called decon

• However, a paper to appear by A. Delaigle discusses problems with the package’s bandwidth selectors

• Her web site has Matlab code for cases that the measurement error is independent of X, including bandwidth selection

Page 14: Raymond J. Carroll Texas A&M University and University of Technology Sydney

Problem Considered Here

• Here is a general class of models. Here are W and X

• The W’s are independent given X

ij i ij i

ij i i

2ij i i u i

W =X +U (X )

E U (X ) | X 0

var U (X ) | X (X )

Page 15: Raymond J. Carroll Texas A&M University and University of Technology Sydney

Background

• There is a substantial econometric literature on technical conditions for identification in many different contexts (S. Schennach, X. Chen, Y. Hu)

• The problem I have stated is known to be nonparametrically identified if there are 3 replicates (and certain technical completeness assumptions hold)

Page 16: Raymond J. Carroll Texas A&M University and University of Technology Sydney

Problem Considered Here

• Here is a general class of models, First, Y

• The classical heteroscedastic model where the variance is important

• Identified if there are 2 replicate W’s

i i i i

i i i

2i i i ε i

Y =g(X )+ε (X )E ε (X ) | X 0

var ε (X ) | X (X )

Page 17: Raymond J. Carroll Texas A&M University and University of Technology Sydney

Background

• The econometric literature invariably uses sieves with orthogonal basis functions

• The theory follows X. Shen’s 1997 paper

Page 18: Raymond J. Carroll Texas A&M University and University of Technology Sydney

Background

• In practice, as with non-penalized splines, 5-7 basis functions are used to represent all densities and functions

• Constraints (such as being positive and integrating to 1 for densities) are often ignored

• In the problem I eventually want to solve, the dimension of the two densities = 19 (latent stuff all around

• Maybe use multivariate Hermite series?

Page 19: Raymond J. Carroll Texas A&M University and University of Technology Sydney

Problem Considered Here

• There is no deconvoluting kernel method that does density or regression deconvolution in the context that the distribution of the measurement error depends on X

Page 20: Raymond J. Carroll Texas A&M University and University of Technology Sydney

Problem Considered Here

• It seems to me that there are two ways to handle this problem in general• Sieves be an econometrician• Bayesian with flexible models

• Our methodology is explicitly Bayesian, but borrows basis function ideas from the sieve approach

Page 21: Raymond J. Carroll Texas A&M University and University of Technology Sydney

Model Formulation• We borrow from Hu and Schennach’s

example and also Staudenmayer, Ruppert and Buonaccorsi

• Here, U is assumed independent of X• Also, e is independent of X

1/2ij i u i ij

1/2i i ε i i

W = X + s (X )U

Y = g(X )+ s (X )ε

Page 22: Raymond J. Carroll Texas A&M University and University of Technology Sydney

Model Formulation• Our model is

• Like previous authors, we model as B-splines with positive coefficients

• We model as B-spline• As frequentists, we could model the

densities of X, U, and e by sieves, and appeal to Hu and Schennach for theory

• We have not investigated this

ε i u is (X ) and s (X )

1/2ij i u i ij

1/2i i ε i i

W = X + s (X )U

Y = g(X )+ s (X )ε

i g(X )

Page 23: Raymond J. Carroll Texas A&M University and University of Technology Sydney

Model Formulation• Our model is

• As Bayesians, we have modeled the densities of X, U, and e by DPMM

• We have found that mixtures of normals, with an unknown number of components, is much faster, just as effective, and very stable numerically

1/2ij i u i ij

1/2i i ε i i

W = X + s (X )U

Y = g(X )+ s (X )ε

Page 24: Raymond J. Carroll Texas A&M University and University of Technology Sydney

Model Formulation• We found that by fixing the number of

components to a largish number works best

• The method concentrates on a lower number of components (Rousseau and Mengersen found this in a non-measurement error context)

• There are lots of issues involved: (a) starting values; (b) hyper-parameters; (c) MH candidates; (d) constraints (e.g., zero means), (e) data standardization, etc.

Page 25: Raymond J. Carroll Texas A&M University and University of Technology Sydney

Model Formulation• Here is a simulation example of density

deconvolution and homoscedasticity with a mixture of normals for X and a Laplace for U

• The settings come from a paper not by us• There are 3 replicates, so the density of U

is also estimated by our method (we let DKDE know the truth)

• I ran our R code as is, with no fine tuning

Page 26: Raymond J. Carroll Texas A&M University and University of Technology Sydney

Model Formulation

Page 27: Raymond J. Carroll Texas A&M University and University of Technology Sydney

Model Formulation• Here is another example• Y = sodium intake as measured by a food

frequency questionnaire (known to be biased)

• W = same thing, but measured by a 24 hour recall (known to be almost unbiased)

• We have R code for this

Page 28: Raymond J. Carroll Texas A&M University and University of Technology Sydney

Model Formulation

The dashed line is the Y=X line, indicating the bias of the FFQ

Page 29: Raymond J. Carroll Texas A&M University and University of Technology Sydney

Multivariate Deconvolution• There are also multivariate problems of

density deconvolution• We have found 4 papers about this• 3 deconvoluting kernel papers, all assume

the density of the measurement errors is known

• 1 of those papers has a bandwidth selector• Bovy et al (2011, AoAS) model X as a

mixture of normals, and assume U is independent of X and Gaussian with known covariance matrix. They use an EM algorithm.

Page 30: Raymond J. Carroll Texas A&M University and University of Technology Sydney

Multivariate Deconvolution• We have generalized our 1-dimension

deconvolution approach as

• Again, X is a mixture of multivariate normals, as is U

• However, standard multivariate inverse Wishart computations fail miserably

1/2ijk ij uj ij ijkW = X + s (X )U

Page 31: Raymond J. Carroll Texas A&M University and University of Technology Sydney

Multivariate Deconvolution• We have generalized our 1-dimension

deconvolution approach as

• We use a factor analytic representation of the component specific covariance matrices with sparsity inducing shrinkage priors on the factor loading matrices (A. Bhattacharya and D. Dunson)

• This is crucial in flexibly lowering the dimension of the covariance matrices

1/2ijk ij uj ij ijkW = X + s (X )U

Page 32: Raymond J. Carroll Texas A&M University and University of Technology Sydney

Multivariate Deconvolution

Multivariate inverse Wisharts on top, Latent factor model on bottom

Blue = MIW, green = MLFA.

Variables are (a) carbs; (b) fiber; (c) protein and (d) potassium

Page 33: Raymond J. Carroll Texas A&M University and University of Technology Sydney

Conclusion• I still want to get to my problem of

multiple nutrients/foods, excess zeros and measurement error

• Dimension reduction and flexible models seem a practical way to go

• Final point: for health risk estimation and nutritional surveillance, only a 1-dimensional summary is needed, hence better rates of convergence