51
Raymond J. Carroll Texas A&M University http://stat.tamu.edu/~carroll [email protected] Postdoctoral Training Program: http://stat.tamu.edu/B3NC Non/Semiparametric Regression and Clustered/Longitudinal Data

Raymond J. Carroll Texas A&M University carroll [email protected] Postdoctoral Training Program:

  • Upload
    nelson

  • View
    48

  • Download
    2

Embed Size (px)

DESCRIPTION

Non/Semiparametric Regression and Clustered/Longitudinal Data. Raymond J. Carroll Texas A&M University http://stat.tamu.edu/~carroll [email protected] Postdoctoral Training Program: http://stat.tamu.edu/B3NC. Where am I From?. Wichita Falls, my hometown. College Station, home of - PowerPoint PPT Presentation

Citation preview

Page 1: Raymond J. Carroll Texas A&M University carroll carroll@stat.tamu.edu Postdoctoral Training Program:

Raymond J. CarrollTexas A&M University

http://stat.tamu.edu/[email protected]

Postdoctoral Training Program:http://stat.tamu.edu/B3NC

Non/Semiparametric Regression and Clustered/Longitudinal Data

Page 2: Raymond J. Carroll Texas A&M University carroll carroll@stat.tamu.edu Postdoctoral Training Program:

Where am I From?

College Station, home of Texas A&MI-35 I-45

Big Bend National Park

Wichita Falls, my hometown

Page 3: Raymond J. Carroll Texas A&M University carroll carroll@stat.tamu.edu Postdoctoral Training Program:

Raymond Carroll Alan Welsh

Naisyin Wang Enno Mammen Xihong Lin

Oliver Linton

Acknowledgments

Series of papers are on my web site

Lin, Wang and Welsh: Longitudinal data (Mammen & Linton for pseudo-observation methods)

Linton and Mammen: time series data

Page 4: Raymond J. Carroll Texas A&M University carroll carroll@stat.tamu.edu Postdoctoral Training Program:

Outline

• Longitudinal models: • panel data

• Background: • splines = kernels for independent data

• Correlated data: • do splines = kernels?

• Semiparametric case: • partially linear model: • does it matter what nonparametric method is

used?

Page 5: Raymond J. Carroll Texas A&M University carroll carroll@stat.tamu.edu Postdoctoral Training Program:

Panel Data (for simplicity)

• i = 1,…,n clusters/individuals• j = 1,…,m observations per cluster

Subject Wave 1 Wave 2 … Wave m

1 X X X

2 X X X

… X

n X X X

Page 6: Raymond J. Carroll Texas A&M University carroll carroll@stat.tamu.edu Postdoctoral Training Program:

Panel Data (for simplicity)

• i = 1,…,n clusters/individuals• j = 1,…,m observations per cluster• Important points:

• The cluster size m is meant to be fixed• This is not a multiple time series

problem where the cluster size increases to infinity

• Some comments on the single time series problem are given near the end of the talk

Page 7: Raymond J. Carroll Texas A&M University carroll carroll@stat.tamu.edu Postdoctoral Training Program:

The Marginal Nonparametric Model

• Y = Response• X = time-varying covariate

• Question: can we improve efficiency by accounting for correlation?

ij ij ij

ij

Y =Θ(X )+εΘ • unknownfunctioncov(ε )=Σ

Page 8: Raymond J. Carroll Texas A&M University carroll carroll@stat.tamu.edu Postdoctoral Training Program:

The Marginal Nonparametric Model

• Important assumption• Covariates at other waves are not

conditionally predictive, i.e., they are surrogates

• This assumption is required for any GLS fit, including parametric GLS

ij ij ik ijE(Y| X ,X for k j)=Θ(X )

Page 9: Raymond J. Carroll Texas A&M University carroll carroll@stat.tamu.edu Postdoctoral Training Program:

Independent Data

• Splines (smoothing, P-splines, etc.) with penalty parameter =

• Ridge regression fit• Some bias, smaller variance• is over-parameterized least squares• is a polynomial regression

n T 2

i i i ii 1

Y (X ) Y (mini X )m tize ( ) dt

0

Page 10: Raymond J. Carroll Texas A&M University carroll carroll@stat.tamu.edu Postdoctoral Training Program:

Independent Data

• Kernels (local averages, local linear, etc.), with kernel density function K and bandwidth h

• As the bandwidth h 0, only observations with X near t get any weight in the fit

n1 i

ii 1

n1 i

i 1

t

t

Xn Y K h(t) Xn K h

Page 11: Raymond J. Carroll Texas A&M University carroll carroll@stat.tamu.edu Postdoctoral Training Program:

Independent Data

• Major methods• Splines • Kernels

• Smoothing parameters required for both

• Fits: similar in many (most?) datasets• Expectation: some combination of bandwidths

and kernel functions look like splines

12

Page 12: Raymond J. Carroll Texas A&M University carroll carroll@stat.tamu.edu Postdoctoral Training Program:

Independent Data

• Splines and kernels are linear in the responses

• Silverman showed that there is a kernel function and a bandwidth so that the weight functions

are asymptotically equivalent • In this sense, splines = kernels • This talk is about the same result for correlated

data

n

-1n i i

i=1Θ( ) = n G ( ,t Xt )Y

nG (t,x)

Page 13: Raymond J. Carroll Texas A&M University carroll carroll@stat.tamu.edu Postdoctoral Training Program:

The weight functions Gn(t=.25,x) in a specific case for independent data

Kernel Smoothing Spline

Note the similarity of shape and the locality: only X’s near t=0.25 get any weight

Page 14: Raymond J. Carroll Texas A&M University carroll carroll@stat.tamu.edu Postdoctoral Training Program:

Working Independence

• Working independence: Ignore all correlations

• Fix up standard errors at the end• Advantage: the assumption

is not required• Disadvantage: possible severe loss of

efficiency if carried too far

ij ij ik ijE(Y| X ,X for k j)=Θ(X )

Page 15: Raymond J. Carroll Texas A&M University carroll carroll@stat.tamu.edu Postdoctoral Training Program:

Working Independence

• Working independence: • Ignore all correlations• Should posit some reasonable marginal variances• Weighting important for efficiency

• Weighted versions: Splines and kernels have obvious analogues

• Standard method: Zeger & Diggle, Hoover, Rice, Wu & Yang, Lin & Ying, etc.

Page 16: Raymond J. Carroll Texas A&M University carroll carroll@stat.tamu.edu Postdoctoral Training Program:

Working Independence

• Working independence: • Weighted splines and weighted kernels are

linear in the responses• The Silverman result still holds• In this sense, splines = kernels

Page 17: Raymond J. Carroll Texas A&M University carroll carroll@stat.tamu.edu Postdoctoral Training Program:

Accounting for Correlation• Splines have an obvious analogue for non-

independent data• Let be a working covariance matrix

• Penalized Generalized least squares (GLS)

• GLS ridge regression• Because splines are based on likelihood ideas,

they generalize quickly to new problems

n T 2

i i i ii 1

-1wY (X ) Y (X ) (t) dtΣ

Page 18: Raymond J. Carroll Texas A&M University carroll carroll@stat.tamu.edu Postdoctoral Training Program:

Accounting for Correlation

• Splines have an obvious analogue for non-independent data

• Kernels are not so obvious• Local likelihood kernel ideas are standard

in independent data problems• Most attempts at kernels for correlated

data have tried to use local likelihood kernel methods

Page 19: Raymond J. Carroll Texas A&M University carroll carroll@stat.tamu.edu Postdoctoral Training Program:

Kernels and Correlation

• Problem: how to define locality for kernels?• Goal: estimate the function at t• Let be a diagonal matrix of standard

kernel weights • Standard Kernel method: GLS pretending

inverse covariance matrix is

• The estimate is inherently local

iK(t,X )

-1/ 2 1/w

2i

1iK (t,X ) K (t,XΣ )

Page 20: Raymond J. Carroll Texas A&M University carroll carroll@stat.tamu.edu Postdoctoral Training Program:

Kernels and CorrelationSpecific case: m=3, n=35

Exchangeable correlation structure

Red: = 0.0

Green: = 0.4

Blue: = 0.8

Note the locality of the kernel method

The weight functions Gn(t=.25,x) in a specific case

18

Page 21: Raymond J. Carroll Texas A&M University carroll carroll@stat.tamu.edu Postdoctoral Training Program:

Splines and CorrelationSpecific case: m=3, n=35

Exchangeable correlation structure

Red: = 0.0

Green: = 0.4

Blue: = 0.8

Note the lack of locality of the spline method

The weight functions Gn(t=.25,x) in a specific case

Page 22: Raymond J. Carroll Texas A&M University carroll carroll@stat.tamu.edu Postdoctoral Training Program:

Splines and CorrelationSpecific case: m=3, n=35

Complex correlation structure

Red: Nearly singular

Green: = 0.0

Blue: = AR(0.8)

Note the lack of locality of the spline method

The weight functions Gn(t=.25,x) in a specific case

Page 23: Raymond J. Carroll Texas A&M University carroll carroll@stat.tamu.edu Postdoctoral Training Program:

Splines and Standard Kernels

• Accounting for correlation:• Standard kernels remain local• Splines are not local

• Numerical results can be confirmed theoretically

• Don’t we want our nonparametric regression estimates to be local?

Page 24: Raymond J. Carroll Texas A&M University carroll carroll@stat.tamu.edu Postdoctoral Training Program:

Results on Kernels and Correlation

• GLS with weights

• Optimal working covariance matrix is working independence!

• Using the correct covariance matrix • Increases variance• Increases MSE• Splines Kernels (or at least these

kernels)

-1/ 2 1/w

2i

1iK (t,X ) K (t,XΣ )

24

Page 25: Raymond J. Carroll Texas A&M University carroll carroll@stat.tamu.edu Postdoctoral Training Program:

Pseudo-Observation Kernel Methods

• Better kernel methods are possible• Pseudo-observation: original method• Construction: specific linear

transformation of Y• Mean = (X)• Covariance = diagonal matrix

• This adjusts the original responses without affecting the mean

* 1i i i i

-1/ 2w= diag( )Σ

Y Y ( ) Y (X )

Page 26: Raymond J. Carroll Texas A&M University carroll carroll@stat.tamu.edu Postdoctoral Training Program:

Pseudo-Observation Kernel Methods

• Construction: specific linear transformation of Y• Mean = (X)• Covariance = diagonal

• Iterative:• Efficiency: More efficient than working

independence• Proof of Principle: kernel methods can be

constructed to take advantage of correlation

* 1i i i iY Y ( ) Y (X )

Page 27: Raymond J. Carroll Texas A&M University carroll carroll@stat.tamu.edu Postdoctoral Training Program:

0

0.5

1

1.5

2

2.5

3

3.5

Excnhg AR Near Sing

SplineP-kernel

Efficiency of Splines and Pseudo-Observation Kernels

Exchng: Exchangeable with correlation 0.6

AR: autoregressive with correlation 0.6

Near Sing: A nearly singular matrix

Page 28: Raymond J. Carroll Texas A&M University carroll carroll@stat.tamu.edu Postdoctoral Training Program:

Better Kernel Methods: SUR

• Simulations of the original pseudo-observation method: it is not as efficient as splines

• Suggests room for a better estimate• Naisyin Wang: her talk will describe an

even better kernel method• Basis: seemingly unrelated regression ideas• Generalizable: based on likelihood ideas

Page 29: Raymond J. Carroll Texas A&M University carroll carroll@stat.tamu.edu Postdoctoral Training Program:

SUR Kernel Methods

• It is well known that the GLS spline has an exact, analytic expression

• We have shown that the Wang SUR kernel method has an exact, analytic expression

• Both methods are linear in the responses

Page 30: Raymond J. Carroll Texas A&M University carroll carroll@stat.tamu.edu Postdoctoral Training Program:

SUR Kernel Methods

• The two methods differ only in one matrix term

• This turns out to be exactly the same matrix term considered by Silverman in his work

• Relatively nontrivial calculations show that Silverman’s result still holds

• Splines = SUR Kernels

29

Page 31: Raymond J. Carroll Texas A&M University carroll carroll@stat.tamu.edu Postdoctoral Training Program:

Nonlocality

• The lack of locality of GLS splines and SUR kernels is surprising

• Suppose we want to estimate the function at t

• If any observation has an X near t, then all observations in the cluster contribute to the fit, not just those with covariates near t

• Splines, pseudo-kernels and SUR kernels all borrow strength

Page 32: Raymond J. Carroll Texas A&M University carroll carroll@stat.tamu.edu Postdoctoral Training Program:

Nonlocality

• Wang’s SUR kernels = BLUP-like pseudo kernels with a clever linear transformation. Let

• SUR kernels are working independence kernels

-1 jkjk

Σ = σ

jjweight =σs

jk

*ij ij ik ikjjk j

σpseudo Y =Y + Y (- X )σobs

Page 33: Raymond J. Carroll Texas A&M University carroll carroll@stat.tamu.edu Postdoctoral Training Program:

Locality of Kernels

• Original pseudo-observation method: pseudo observations uncorrelated

• SUR kernels: pseudo-observations are correlated

• SUR kernels are not local• SUR kernels are local in (the same!)

pseudo-observations

jk

*ij ij ik ikjjk j

σpseudo Y =Y + Y (- X )σobs

Page 34: Raymond J. Carroll Texas A&M University carroll carroll@stat.tamu.edu Postdoctoral Training Program:

Locality of Splines

• Splines = SUR kernels (Silverman-type result)

• GLS spline: • Iterative• standard independent spline smoothing• SUR pseudo-observations at each iteration

• GLS splines are not local• GLS splines are local in (the same!) pseudo-

observations

Page 35: Raymond J. Carroll Texas A&M University carroll carroll@stat.tamu.edu Postdoctoral Training Program:

Time Series Problems

• Time series problems: many of the same issues arise

• Original pseudo-observation method• Two stages• Linear transformation• Mean (X)• Independent errors• Single standard kernel applied

• Potential for great gains in efficiency (even infinite for AR problems with large correlation)

Page 36: Raymond J. Carroll Texas A&M University carroll carroll@stat.tamu.edu Postdoctoral Training Program:

Time Series: AR(1) Illustration, First Pseudo Observation Method

• AR(1), correlation :

• Regress Yt0 on Xt

0t t t-1 t-1Y =Y - Y -Θ(Xρ )

t t-1 tε - ε =u (whitenρ oise)

Page 37: Raymond J. Carroll Texas A&M University carroll carroll@stat.tamu.edu Postdoctoral Training Program:

Time Series Problems

• More efficient methods can be constructed• Series of regression problems: for all j,

• Pseudo observations • Mean• White noise errors• Regress for each j: fits are asymptotically

independent• Then weighted average

• Time series version of SUR-kernels for longitudinal data?

jiY

t j(X )

Page 38: Raymond J. Carroll Texas A&M University carroll carroll@stat.tamu.edu Postdoctoral Training Program:

Time Series: AR(1) Illustration, New Pseudo Observation Method

• AR(1), correlation :

• Regress Yt0 on Xt and Yt

1 on Xt-1

• Weights: 1 and 2

0t t t-1 t-1Y =Y - Y -Θ(Xρ )

t t-1 tρε - ε =u

1 -1t t-1 t tY =Y - Y -Θ(Xρ )

Page 39: Raymond J. Carroll Texas A&M University carroll carroll@stat.tamu.edu Postdoctoral Training Program:

Time Series Problems

• AR(1) errors with correlation • Efficiency of original pseudo-observation

method to working independence:

• Efficiency of new (SUR?) pseudo-observation method to original method:

21 : as 11

21 : 2 as 1

36

Page 40: Raymond J. Carroll Texas A&M University carroll carroll@stat.tamu.edu Postdoctoral Training Program:

The Semiparametric Model

• Y = Response• X,Z = time-varying covariates

• Question: can we improve efficiency for by accounting for correlation?

ij ij ij ij

ij

Y =Z +Θ(X )+εcov(ε

β)=Σ

Page 41: Raymond J. Carroll Texas A&M University carroll carroll@stat.tamu.edu Postdoctoral Training Program:

Profile Methods

• Given , solve for say• Basic idea: Regress

• Working independence• Standard kernels• Pseudo –observations kernels• SUR kernels

*ij ij ij ijY =Y -Z on Xβ

ijΘ(X ,β)

ij ij ij ij

ij

Y =Z +Θ(X )+εcov(ε

β)=Σ

Page 42: Raymond J. Carroll Texas A&M University carroll carroll@stat.tamu.edu Postdoctoral Training Program:

Profile Methods

• Given , solve for say• Then fit GLS or W.I. to the model with mean

• Question: does it matter what kernel method is used?

• Question: How bad is using W.I. everywhere?• Question: are there efficient choices?

ij ijZ + (XΘβ ,β)

ijΘ(X ,β)

ij ij ij ij

ij

Y =Z +Θ(X )+εcov(ε

β)=Σ

Page 43: Raymond J. Carroll Texas A&M University carroll carroll@stat.tamu.edu Postdoctoral Training Program:

The Semiparametric Model: Special Case

• If X does not vary with time, simple semiparametric efficient method available

• The basic point is that has common mean and covariance matrix

• If were a polynomial, GLS likelihood methods would be natural

ij ij i ijY =Z +Θ(X )+εβ

ij ijY -Z βiΘ(X ) Σ

Θ( )

Page 44: Raymond J. Carroll Texas A&M University carroll carroll@stat.tamu.edu Postdoctoral Training Program:

The Semiparametric Model: Special Case

• Method: Replace polynomial GLS likelihood with GLS local likelihood with weights

• Then do GLS on the derived variable

• Semiparametric efficient

ij ij i ijY =Z +Θ(X )+εβ

iXK ht

ij ijZ + (XΘβ ,β)

Page 45: Raymond J. Carroll Texas A&M University carroll carroll@stat.tamu.edu Postdoctoral Training Program:

Profile Method: General Case• Given , solve for say• Then fit GLS or W.I. to the model with mean

• In this general case, how you estimate matters• Working independence• Standard kernel• Pseudo-observation kernel• SUR kernel

ij ijZ + (XΘβ ,β)

ijΘ(X ,β)

Page 46: Raymond J. Carroll Texas A&M University carroll carroll@stat.tamu.edu Postdoctoral Training Program:

Profile Methods• In this general case, how you estimate

matters• Working independence• Standard kernel• Pseudo-observation kernel• SUR kernel

• We have published the asymptotically efficient score, but not how to implement it

Page 47: Raymond J. Carroll Texas A&M University carroll carroll@stat.tamu.edu Postdoctoral Training Program:

Profile Methods

• Naisyin Wang’s talk will describe • These phenomena• Search for an efficient estimator• Loss of efficiency for using working

independence to estimate • Examples where ignoring the correlation

can change conclusions

Page 48: Raymond J. Carroll Texas A&M University carroll carroll@stat.tamu.edu Postdoctoral Training Program:

Conclusions (1/3): Nonparametric Regression

• In nonparametric regression• Kernels = splines for working

independence (W.I.)• Weighting is important for W.I.• Working independence is inefficient• Standard kernels splines for

correlated data

Page 49: Raymond J. Carroll Texas A&M University carroll carroll@stat.tamu.edu Postdoctoral Training Program:

Conclusions (2/3): Nonparametric Regression

• In nonparametric regression• Pseudo-observation methods improve

upon working independence• SUR kernels = splines for correlated

data• Splines and SUR kernels are not local• Splines and SUR kernels are local in

pseudo-observations

Page 50: Raymond J. Carroll Texas A&M University carroll carroll@stat.tamu.edu Postdoctoral Training Program:

Conclusions (3/3): Semiparametric Regression

• In semiparametric regression• Profile methods are a general class• Fully efficient parameter estimates are

easily constructed if X is not time-varying• When X is time-varying, method of

estimating affects properties of parameter estimates

• Ignoring correlations can change conclusions (see N. Wang talk)

Page 51: Raymond J. Carroll Texas A&M University carroll carroll@stat.tamu.edu Postdoctoral Training Program:

Conclusions: Splines versus Kernels

• One has to be struck by the fact that all the grief in this problem has come from trying to define kernel methods

• At the end of the day, they are no more efficient than splines, and harder and more subtle to define

• Showing equivalence as we have done suggests the good properties of splines