38
1 John R. Stevens Utah State University Notes 2. Statistical Methods I Mathematics Educators Workshop 28 March 2009 1 Advanced Statistical Methods: Beyond Linear Regression http://www.stat.usu.edu/~jrstevens/pcmi

1 John R. Stevens Utah State University Notes 2. Statistical Methods I Mathematics Educators Workshop 28 March 2009 1 Advanced Statistical Methods: Beyond

  • View
    215

  • Download
    1

Embed Size (px)

Citation preview

Page 1: 1 John R. Stevens Utah State University Notes 2. Statistical Methods I Mathematics Educators Workshop 28 March 2009 1 Advanced Statistical Methods: Beyond

1

John R. StevensUtah State University

Notes 2. Statistical Methods I

Mathematics Educators Workshop 28 March 2009

1

Advanced Statistical Methods:

Beyond Linear Regression

http://www.stat.usu.edu/~jrstevens/pcmi

Page 2: 1 John R. Stevens Utah State University Notes 2. Statistical Methods I Mathematics Educators Workshop 28 March 2009 1 Advanced Statistical Methods: Beyond

2

What would your students know to do with these data?

Obs Flight Temp Damage1 STS1 66 NO2 STS9 70 NO3 STS51B 75 NO4 STS2 70 YES5 STS41B 57 YES6 STS51G 70 NO7 STS3 69 NO8 STS41C 63 YES9 STS51F 81 NO10 STS4 8011 STS41D 70 YES12 STS51I 76 NO13 STS5 68 NO14 STS41G 78 NO15 STS51J 79 NO16 STS6 67 NO17 STS51A 67 NO18 STS61A 75 YES19 STS7 72 NO20 STS51C 53 YES21 STS61B 76 NO22 STS8 73 NO23 STS51D 67 NO24 STS61C 58 YES

Page 3: 1 John R. Stevens Utah State University Notes 2. Statistical Methods I Mathematics Educators Workshop 28 March 2009 1 Advanced Statistical Methods: Beyond

3

Two Sample t-test

data: Temp by Damage t = 3.1032, df = 21, p-value = 0.005383alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 2.774344 14.047085 sample estimates: mean in group NO mean in group YES 72.12500 63.71429

Page 4: 1 John R. Stevens Utah State University Notes 2. Statistical Methods I Mathematics Educators Workshop 28 March 2009 1 Advanced Statistical Methods: Beyond

4

Does the t-test make sense here?Traditional:

Treatment Group mean vs. Control Group mean

What is the response variable?Temperature? [Quantitative, Continuous]Damage? [Qualitative]

Page 5: 1 John R. Stevens Utah State University Notes 2. Statistical Methods I Mathematics Educators Workshop 28 March 2009 1 Advanced Statistical Methods: Beyond

5

Traditional Statistical Model 1Linear Regression: predict continuous response from

[quantitative] predictorsY=weight, X=heightY=income, X=education levelY=first-semester GPA, X=parent’s incomeY=temperature, X=damage (0=no, 1=yes)

Can also “control for” other [possibly categorical] factors (“covariates”):SexMajorState of OriginNumber of Siblings

Page 6: 1 John R. Stevens Utah State University Notes 2. Statistical Methods I Mathematics Educators Workshop 28 March 2009 1 Advanced Statistical Methods: Beyond

6

Traditional Statistical Model 2Logistic Regression: predict binary response from

[quantitative] predictorsY=‘graduate within 5 years’=0 vs. Y=‘not’=1

X=first-semester GPAY=0 (no damage) vs. Y=1 (damage)

X=temperatureY=0 (survive) vs. Y=1 (death)

X=dosage (dose-response model)

Can also “control” for other factors, or “covariates”Race, SexGenotype

p = P(Y=1 | relevant factors) = prob. that Y=1, given state of relevant factors

Page 7: 1 John R. Stevens Utah State University Notes 2. Statistical Methods I Mathematics Educators Workshop 28 March 2009 1 Advanced Statistical Methods: Beyond

7

Traditional Dose-Response Model

p = Probability of “death” at dose d:

Look at what affects the shape of the curve, LD50 (lethal dose for 50% efficacy), etc.

dp

p101

log

0.0 0.2 0.4 0.6 0.8 1.00.

00.

20.

40.

60.

81.

0

Dose-Response Curve

d

p

Page 8: 1 John R. Stevens Utah State University Notes 2. Statistical Methods I Mathematics Educators Workshop 28 March 2009 1 Advanced Statistical Methods: Beyond

8

“Fitting” the Dose-Response Model

Why “logistic” regression?β0 = place-holder constantβ1 = effect of “dosage” dTo estimate parameters:

Newton-Raphson iterative process to “maximize the likelihood” of the model

Compare Y=0 (no damage) with Y=1 (damage) groups

dp

p101

log

Page 9: 1 John R. Stevens Utah State University Notes 2. Statistical Methods I Mathematics Educators Workshop 28 March 2009 1 Advanced Statistical Methods: Beyond

9

Likelihood Function (to be maximized)

i

yi

yi

ii ppL

fp

pY

pY

Y

110

10

)1(,

...,

10Pr

1Pr

1,0

likelihood for obs. i

multiply probabilities (independence)

1010 ,log, Ll

Page 10: 1 John R. Stevens Utah State University Notes 2. Statistical Methods I Mathematics Educators Workshop 28 March 2009 1 Advanced Statistical Methods: Beyond

10

Estimation by IRLSIteratively Reweighted Least Squares

equivalent: Newton-Raphson algorithm for iteratively solving “score” equations

0

, 10

j

l

Page 11: 1 John R. Stevens Utah State University Notes 2. Statistical Methods I Mathematics Educators Workshop 28 March 2009 1 Advanced Statistical Methods: Beyond

11

Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 15.0429 7.3786 2.039 0.0415 *Temp -0.2322 0.1082 -2.145 0.0320 *---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Page 12: 1 John R. Stevens Utah State University Notes 2. Statistical Methods I Mathematics Educators Workshop 28 March 2009 1 Advanced Statistical Methods: Beyond

12

...ˆ p

Page 13: 1 John R. Stevens Utah State University Notes 2. Statistical Methods I Mathematics Educators Workshop 28 March 2009 1 Advanced Statistical Methods: Beyond

13

What if the data were even “better”?Complete

separation of points

What should happen toour “slope” estimate?

Page 14: 1 John R. Stevens Utah State University Notes 2. Statistical Methods I Mathematics Educators Workshop 28 March 2009 1 Advanced Statistical Methods: Beyond

14

Coefficients: Estimate Std. Error z value Pr(>|z|)(Intercept) 928.9 913821.4 0.001 1Temp -14.4 14106.7 -0.001 1

...ˆ p

Page 15: 1 John R. Stevens Utah State University Notes 2. Statistical Methods I Mathematics Educators Workshop 28 March 2009 1 Advanced Statistical Methods: Beyond

15

Failure?Shape of likelihood function

Large Standard Errors

Solution only in 2006

Rather than maximizing likelihood, consider a penalty:

10

1010

ˆ,ˆ of variance"of magnitude"/5.

,,~

ll

Page 16: 1 John R. Stevens Utah State University Notes 2. Statistical Methods I Mathematics Educators Workshop 28 March 2009 1 Advanced Statistical Methods: Beyond

16

Model fitted by Penalized MLConfidence intervals and p-values by Profile Likelihood

coef se(coef) Chisq p(Intercept) 30.4129282 16.5145441 11.35235 0.0007535240Temp -0.4832632 0.2528934 13.06178 0.0003013835

...ˆ p

Page 17: 1 John R. Stevens Utah State University Notes 2. Statistical Methods I Mathematics Educators Workshop 28 March 2009 1 Advanced Statistical Methods: Beyond

17

Beetle Data

Phosphine Total Dosage Receiving Total Total Survivors Observed at Genotype (mg/L) Dosage Deaths Survivors -/B -/H -/A +/B +/H +/A 0 98 0 98 31 27 10 6 20 4 0.003 100 16 84 18 26 10 6 20 4 0.004 100 68 32 10 4 3 5 7 4 0.005 100 78 22 1 4 7 2 6 2 0.01 100 77 23 0 1 9 8 5 0 0.05 300 270 30 0 0 0 5 20 5 0.1 400 383 17 0 0 0 0 10 7 0.2 750 740 10 0 0 0 0 0 10 0.3 500 490 10 0 0 0 0 0 10 0.4 500 492 8 0 0 0 0 0 8 1.0 7850 7,806 44 0 0 0 0 0 44 10,798 10,420 378

Page 18: 1 John R. Stevens Utah State University Notes 2. Statistical Methods I Mathematics Educators Workshop 28 March 2009 1 Advanced Statistical Methods: Beyond

18

Dose-response modelRecall simple model:

pij = Pr(Y=1 | dosage level j and genotype level i)

But – when is genotype (covariate Gi) observed?

jiiijij

ij dDGp

p

1log

dp

p101

log

Page 19: 1 John R. Stevens Utah State University Notes 2. Statistical Methods I Mathematics Educators Workshop 28 March 2009 1 Advanced Statistical Methods: Beyond

19

Coefficients: Estimate Std. Error z value Pr(>|z|)(Intercept) -2.657e+01 8.901e+04 -2.98e-04 1dose -7.541e-26 1.596e+07 -4.72e-33 1G1+ -3.386e-28 1.064e+05 -3.18e-33 1G2B -1.344e-14 1.092e+05 -1.23e-19 1G2H -3.349e-28 1.095e+05 -3.06e-33 1dose:G1+ 7.541e-26 1.596e+07 4.72e-33 1dose:G2B 3.984e-12 3.075e+07 1.30e-19 1dose:G2H 7.754e-26 2.760e+07 2.81e-33 1G1+:G2B 1.344e-14 1.465e+05 9.17e-20 1G1+:G2H 3.395e-28 1.327e+05 2.56e-33 1dose:G1+:G2B -3.984e-12 3.098e+07 -1.29e-19 1dose:G1+:G2H -7.756e-26 2.763e+07 -2.81e-33 1

Before we “fix” this, first a little detour …

Page 20: 1 John R. Stevens Utah State University Notes 2. Statistical Methods I Mathematics Educators Workshop 28 March 2009 1 Advanced Statistical Methods: Beyond

20

A Multivariate Gaussian Mixture

Component j isMVN(μj,Σj) with proportion πj

Page 21: 1 John R. Stevens Utah State University Notes 2. Statistical Methods I Mathematics Educators Workshop 28 March 2009 1 Advanced Statistical Methods: Beyond

21

The Maximum Likelihood Approach

Page 22: 1 John R. Stevens Utah State University Notes 2. Statistical Methods I Mathematics Educators Workshop 28 March 2009 1 Advanced Statistical Methods: Beyond

22

A Possible Work-Around

i jjjiij

nJ

ij

yZl

I

,|log,|

,...,

j groupin i obs.

11

Keys here:

1.the true group memberships Δ are unknown (latent)

2.statisticians specialize in unknown quantities

Page 23: 1 John R. Stevens Utah State University Notes 2. Statistical Methods I Mathematics Educators Workshop 28 March 2009 1 Advanced Statistical Methods: Beyond

23

A reasonable approach

1. Randomly assign group memberships Δ, and estimate group means μj , covariance matrices Σj , and mixing proportions πj

2. Given those values, calculate (for each obs.) ξj = E[Δj|θ] = P(obs. in group j)

3. Update estimates for μj , Σj , and πj , weighting each observation by these ξ :

4. Repeat steps 2 and 3 to convergence

iij

iiij

j

y

Page 24: 1 John R. Stevens Utah State University Notes 2. Statistical Methods I Mathematics Educators Workshop 28 March 2009 1 Advanced Statistical Methods: Beyond

24 Plotting character and color indicate most likely component

Page 25: 1 John R. Stevens Utah State University Notes 2. Statistical Methods I Mathematics Educators Workshop 28 March 2009 1 Advanced Statistical Methods: Beyond

25

The EM (Baum-Welch) Algorithm- maximization made easier with Zm = latent (unobserved) data; T = (Z,Zm) = complete data

1. Start with initial guesses for parameters2. Expectation: At the kth iteration,

compute

3. Maximization: Obtain estimate

by maximizing over

4. Iterate steps 2 and 3 to convergence ($?)

)0(

)(')( ˆ,||ˆ, kk ZTlEQ

)1(ˆ k

)(ˆ, kQ

Page 26: 1 John R. Stevens Utah State University Notes 2. Statistical Methods I Mathematics Educators Workshop 28 March 2009 1 Advanced Statistical Methods: Beyond

26

Beetle Data – NotationObserved values

Unobserved (latent) values If Nij had been observed:

How Nij can be [latently] considered:

i genotype withj dosageat survivors #

j dosage receiving #

ij

j

n

N

i genotype withj dosage receiving # ijN

i genotypefor j dosageat death of Prob.

1,~

ij

ijijij

p

pNBinomialn

i genotype withpopulation of prop.

,~),,( 61

i

jjjj

P

PNlMultinomiaNNN

Page 27: 1 John R. Stevens Utah State University Notes 2. Statistical Methods I Mathematics Educators Workshop 28 March 2009 1 Advanced Statistical Methods: Beyond

27

Likelihood FunctionParameters θ=(p,P) and complete data T=(n,N)

After simplification:

Mechanism of missing data suggests EM algorithm

}]log1log

!log!loglog{!log[

,|,log|

ijijijijij

ijijijiiji

jj

pnNpn

nNnPNN

PpNnfTl

Page 28: 1 John R. Stevens Utah State University Notes 2. Statistical Methods I Mathematics Educators Workshop 28 March 2009 1 Advanced Statistical Methods: Beyond

28

Missing at Random (MAR)

Necessary assumption for usual EM applications

Covariate x is MAR if probability of observing x does not depend on x or any other unobserved covariate, but may depend on response and other observed covariates (Ibrahim 1990)

Here – genotype is observed only for survivors, and for all subjects at zero dosage

Page 29: 1 John R. Stevens Utah State University Notes 2. Statistical Methods I Mathematics Educators Workshop 28 March 2009 1 Advanced Statistical Methods: Beyond

29

Initialization StepTwo classes of marginal information here

For all dosage levels j – observeAt zero dosage level – observe for genotype iAllows estimate of Pi Consider marginal distn. of missing categorical

covariate (genotype)Using zero dosage level:

This is the key – the marginal distribution of the missing categorical covariate

5.0ˆ , ˆ )0(

0

0,)0(

iji

i pN

NP

jN

0,iN

Page 30: 1 John R. Stevens Utah State University Notes 2. Statistical Methods I Mathematics Educators Workshop 28 March 2009 1 Advanced Statistical Methods: Beyond

30

Expectation StepDropping “constants” and

:

Need to evaluate:

!log !log ijj nN

ijijkijijij

kiji

kij

ji

k

pnNpn

LPNQ

log~

1log

log~

{~

)(

)()(

,

)(

)()()()( ˆ,|!log , ˆ,|~ k

ijijkij

kij

kij nnNELnNEN

(*)

Page 31: 1 John R. Stevens Utah State University Notes 2. Statistical Methods I Mathematics Educators Workshop 28 March 2009 1 Advanced Statistical Methods: Beyond

31

Expectation Step

Bayes Formula:

Multinomial

jNjjj

jjjjj NfNnf

NfNnfnNh

|

|,|

ji

ijjjjj nNlMultinomiannN ,~,|

lljl

ijiij pP

pP

ijl

ljj

l

klj

kl

kij

kik

ij nnNpP

pPN

)()(

)()()(

ˆˆˆˆ~

(*)

Page 32: 1 John R. Stevens Utah State University Notes 2. Statistical Methods I Mathematics Educators Workshop 28 March 2009 1 Advanced Statistical Methods: Beyond

32

Expectation Step

For :Not needed for maximization

– only affects EM convergence rateDirect calculation from multinomial distn. is

“possible” – but computationally prohibitiveNeed to employ some approximation strategy

Second-order Taylor series about , using Binet’s formula

)()( ˆ,|!log kijij

kij nnNEL

ijkij nN )(~

2log11log!log 21

21 nNnNnNnN

(*)

Page 33: 1 John R. Stevens Utah State University Notes 2. Statistical Methods I Mathematics Educators Workshop 28 March 2009 1 Advanced Statistical Methods: Beyond

33

Expectation StepConsider Binet’s formula

(like Stirling’s):

Have:

Use a second-order Taylor series approximation taken about

as a function of : ijjkkk

ijkij nNpPNL ,,ˆ,ˆ,

~

~ )()()()(

ijkij nN )(~

2log1

1log!log

21

21

nN

nNnNnN

)(2)( ˆ,| , ˆ,| kij

kij nNEnNE 0 10 20 30 40 50

050

100

150

N-n

(*)

Page 34: 1 John R. Stevens Utah State University Notes 2. Statistical Methods I Mathematics Educators Workshop 28 March 2009 1 Advanced Statistical Methods: Beyond

34

Maximization StepPortion of related to :

Portion of related to :

)(~ kQ

)(~ kQ

P

p

)1(

,

)()(

ˆ

log~~

k

i

jii

kij

kP

P

PNQ

by Lagrange multipliers

)1(

,

)()(

ˆ

log~

1log~

kij

jiijij

kijijij

kp

p

pnNpnQ

by Newton-Raphson iterations, with some parameterization ii DG ,

(*)

Page 35: 1 John R. Stevens Utah State University Notes 2. Statistical Methods I Mathematics Educators Workshop 28 March 2009 1 Advanced Statistical Methods: Beyond

35

Convergence

0 500 1000 1500

-76

79

0-7

67

70

-76

75

0

EM Convergence with Criterion 1e-12 :1639 Iterations in 52 Seconds

EM Iteration

Exp

ect

ed

Lo

g L

ike

liho

od

Q

Page 36: 1 John R. Stevens Utah State University Notes 2. Statistical Methods I Mathematics Educators Workshop 28 March 2009 1 Advanced Statistical Methods: Beyond

36

0.001 0.050 1.000

0.0

0.4

0.8

-/B

Dosage

Pro

b. o

f de

ath

0.001 0.050 1.000

0.0

0.4

0.8

-/H

DosageP

rob.

of

deat

h

0.001 0.050 1.000

0.0

0.4

0.8

-/A

Dosage

Pro

b. o

f de

ath

0.001 0.050 1.000

0.0

0.4

0.8

+/B

Dosage

Pro

b. o

f de

ath

0.001 0.050 1.000

0.0

0.4

0.8

+/H

Dosage

Pro

b. o

f de

ath

0.001 0.050 1.000

0.0

0.4

0.8

+/A

DosageP

rob.

of

deat

h

Dose Response Curves (log scale)

Page 37: 1 John R. Stevens Utah State University Notes 2. Statistical Methods I Mathematics Educators Workshop 28 March 2009 1 Advanced Statistical Methods: Beyond

37

EM Results

test statistic for H0: no dosage

effect

separation of points …

Page 38: 1 John R. Stevens Utah State University Notes 2. Statistical Methods I Mathematics Educators Workshop 28 March 2009 1 Advanced Statistical Methods: Beyond

38

Topics Used Here Calculus

Differentiation & Integration (including vector differentiation)

Lagrange MultipliersTaylor Series Expansions

Linear AlgebraDeterminants & EigenvaluesInverting [computationally/nearly singular] MatricesPositive Definiteness

ProbabilityDistributions: Multivariate Normal, Binomial, MultinomialBayes Formula

StatisticsLogistic RegressionSeparation of Points[Penalized] Likelihood MaximizationEM Algorithm

Biology – a little time and communication