1 John R. Stevens Utah State University Notes 2. Statistical Methods I Mathematics Educators Workshop 28 March 2009 1 Advanced Statistical Methods: Beyond

1

John R. StevensUtah State University

Notes 2. Statistical Methods I

Mathematics Educators Workshop 28 March 2009

1

Advanced Statistical Methods:

Beyond Linear Regression

http://www.stat.usu.edu/~jrstevens/pcmi

2

What would your students know to do with these data?

Obs Flight Temp Damage1 STS1 66 NO2 STS9 70 NO3 STS51B 75 NO4 STS2 70 YES5 STS41B 57 YES6 STS51G 70 NO7 STS3 69 NO8 STS41C 63 YES9 STS51F 81 NO10 STS4 8011 STS41D 70 YES12 STS51I 76 NO13 STS5 68 NO14 STS41G 78 NO15 STS51J 79 NO16 STS6 67 NO17 STS51A 67 NO18 STS61A 75 YES19 STS7 72 NO20 STS51C 53 YES21 STS61B 76 NO22 STS8 73 NO23 STS51D 67 NO24 STS61C 58 YES

3

Two Sample t-test

data: Temp by Damage t = 3.1032, df = 21, p-value = 0.005383alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 2.774344 14.047085 sample estimates: mean in group NO mean in group YES 72.12500 63.71429

4

Does the t-test make sense here?Traditional:

Treatment Group mean vs. Control Group mean

What is the response variable?Temperature? [Quantitative, Continuous]Damage? [Qualitative]

5

Traditional Statistical Model 1Linear Regression: predict continuous response from

[quantitative] predictorsY=weight, X=heightY=income, X=education levelY=first-semester GPA, X=parent’s incomeY=temperature, X=damage (0=no, 1=yes)

Can also “control for” other [possibly categorical] factors (“covariates”):SexMajorState of OriginNumber of Siblings

6

Traditional Statistical Model 2Logistic Regression: predict binary response from

[quantitative] predictorsY=‘graduate within 5 years’=0 vs. Y=‘not’=1

X=first-semester GPAY=0 (no damage) vs. Y=1 (damage)

X=temperatureY=0 (survive) vs. Y=1 (death)

X=dosage (dose-response model)

Can also “control” for other factors, or “covariates”Race, SexGenotype

p = P(Y=1 | relevant factors) = prob. that Y=1, given state of relevant factors

7

Traditional Dose-Response Model

p = Probability of “death” at dose d:

Look at what affects the shape of the curve, LD50 (lethal dose for 50% efficacy), etc.

dp

p101

log

0.0 0.2 0.4 0.6 0.8 1.00.

00.

20.

40.

60.

81.

0

Dose-Response Curve

d

p

8

“Fitting” the Dose-Response Model

Why “logistic” regression?β0 = place-holder constantβ1 = effect of “dosage” dTo estimate parameters:

Newton-Raphson iterative process to “maximize the likelihood” of the model

Compare Y=0 (no damage) with Y=1 (damage) groups

dp

p101

log

9

Likelihood Function (to be maximized)

i

yi

yi

ii ppL

fp

pY

pY

Y

110

10

)1(,

...,

10Pr

1Pr

1,0

likelihood for obs. i

multiply probabilities (independence)

1010 ,log, Ll

10

Estimation by IRLSIteratively Reweighted Least Squares

equivalent: Newton-Raphson algorithm for iteratively solving “score” equations

0

, 10

j

l

11

Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 15.0429 7.3786 2.039 0.0415 *Temp -0.2322 0.1082 -2.145 0.0320 *---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

12

...ˆ p

13

What if the data were even “better”?Complete

separation of points

What should happen toour “slope” estimate?

14

Coefficients: Estimate Std. Error z value Pr(>|z|)(Intercept) 928.9 913821.4 0.001 1Temp -14.4 14106.7 -0.001 1

...ˆ p

15

Failure?Shape of likelihood function

Large Standard Errors

Solution only in 2006

Rather than maximizing likelihood, consider a penalty:

10

1010

ˆ,ˆ of variance"of magnitude"/5.

,,~

ll

16

Model fitted by Penalized MLConfidence intervals and p-values by Profile Likelihood

coef se(coef) Chisq p(Intercept) 30.4129282 16.5145441 11.35235 0.0007535240Temp -0.4832632 0.2528934 13.06178 0.0003013835

...ˆ p

17

Beetle Data

Phosphine Total Dosage Receiving Total Total Survivors Observed at Genotype (mg/L) Dosage Deaths Survivors -/B -/H -/A +/B +/H +/A 0 98 0 98 31 27 10 6 20 4 0.003 100 16 84 18 26 10 6 20 4 0.004 100 68 32 10 4 3 5 7 4 0.005 100 78 22 1 4 7 2 6 2 0.01 100 77 23 0 1 9 8 5 0 0.05 300 270 30 0 0 0 5 20 5 0.1 400 383 17 0 0 0 0 10 7 0.2 750 740 10 0 0 0 0 0 10 0.3 500 490 10 0 0 0 0 0 10 0.4 500 492 8 0 0 0 0 0 8 1.0 7850 7,806 44 0 0 0 0 0 44 10,798 10,420 378

18

Dose-response modelRecall simple model:

pij = Pr(Y=1 | dosage level j and genotype level i)

But – when is genotype (covariate Gi) observed?

jiiijij

ij dDGp

p

1log

dp

p101

log

19

Coefficients: Estimate Std. Error z value Pr(>|z|)(Intercept) -2.657e+01 8.901e+04 -2.98e-04 1dose -7.541e-26 1.596e+07 -4.72e-33 1G1+ -3.386e-28 1.064e+05 -3.18e-33 1G2B -1.344e-14 1.092e+05 -1.23e-19 1G2H -3.349e-28 1.095e+05 -3.06e-33 1dose:G1+ 7.541e-26 1.596e+07 4.72e-33 1dose:G2B 3.984e-12 3.075e+07 1.30e-19 1dose:G2H 7.754e-26 2.760e+07 2.81e-33 1G1+:G2B 1.344e-14 1.465e+05 9.17e-20 1G1+:G2H 3.395e-28 1.327e+05 2.56e-33 1dose:G1+:G2B -3.984e-12 3.098e+07 -1.29e-19 1dose:G1+:G2H -7.756e-26 2.763e+07 -2.81e-33 1

Before we “fix” this, first a little detour …

20

A Multivariate Gaussian Mixture

Component j isMVN(μj,Σj) with proportion πj

21

The Maximum Likelihood Approach

22

A Possible Work-Around

i jjjiij

nJ

ij

yZl

I

,|log,|

,...,

j groupin i obs.

11

Keys here:

1.the true group memberships Δ are unknown (latent)

2.statisticians specialize in unknown quantities

23

A reasonable approach

1. Randomly assign group memberships Δ, and estimate group means μj , covariance matrices Σj , and mixing proportions πj

2. Given those values, calculate (for each obs.) ξj = E[Δj|θ] = P(obs. in group j)

3. Update estimates for μj , Σj , and πj , weighting each observation by these ξ :

4. Repeat steps 2 and 3 to convergence

iij

iiij

j

y

24 Plotting character and color indicate most likely component

25

The EM (Baum-Welch) Algorithm- maximization made easier with Zm = latent (unobserved) data; T = (Z,Zm) = complete data

1. Start with initial guesses for parameters2. Expectation: At the kth iteration,

compute

3. Maximization: Obtain estimate

by maximizing over

4. Iterate steps 2 and 3 to convergence ($?)

)0(

)(')( ˆ,||ˆ, kk ZTlEQ

)1(ˆ k

)(ˆ, kQ

26

Beetle Data – NotationObserved values

Unobserved (latent) values If Nij had been observed:

How Nij can be [latently] considered:

i genotype withj dosageat survivors #

j dosage receiving #

ij

j

n

N

i genotype withj dosage receiving # ijN

i genotypefor j dosageat death of Prob.

1,~

ij

ijijij

p

pNBinomialn

i genotype withpopulation of prop.

,~),,( 61

i

jjjj

P

PNlMultinomiaNNN

27

Likelihood FunctionParameters θ=(p,P) and complete data T=(n,N)

After simplification:

Mechanism of missing data suggests EM algorithm

}]log1log

!log!loglog{!log[

,|,log|

ijijijijij

ijijijiiji

jj

pnNpn

nNnPNN

PpNnfTl

28

Missing at Random (MAR)

Necessary assumption for usual EM applications

Covariate x is MAR if probability of observing x does not depend on x or any other unobserved covariate, but may depend on response and other observed covariates (Ibrahim 1990)

Here – genotype is observed only for survivors, and for all subjects at zero dosage

29

Initialization StepTwo classes of marginal information here

For all dosage levels j – observeAt zero dosage level – observe for genotype iAllows estimate of Pi Consider marginal distn. of missing categorical

covariate (genotype)Using zero dosage level:

This is the key – the marginal distribution of the missing categorical covariate

5.0ˆ , ˆ )0(

0

0,)0(

iji

i pN

NP

jN

0,iN

30

Expectation StepDropping “constants” and

:

Need to evaluate:

!log !log ijj nN

ijijkijijij

kiji

kij

ji

k

pnNpn

LPNQ

log~

1log

log~

{~

)(

)()(

,

)(

)()()()( ˆ,|!log , ˆ,|~ k

ijijkij

kij

kij nnNELnNEN

(*)

31

Expectation Step

Bayes Formula:

Multinomial

jNjjj

jjjjj NfNnf

NfNnfnNh

|

|,|

ji

ijjjjj nNlMultinomiannN ,~,|

lljl

ijiij pP

pP

ijl

ljj

l

klj

kl

kij

kik

ij nnNpP

pPN

)()(

)()()(

ˆˆˆˆ~

(*)

32

Expectation Step

For :Not needed for maximization

– only affects EM convergence rateDirect calculation from multinomial distn. is

“possible” – but computationally prohibitiveNeed to employ some approximation strategy

Second-order Taylor series about , using Binet’s formula

)()( ˆ,|!log kijij

kij nnNEL

ijkij nN )(~

2log11log!log 21

21 nNnNnNnN

(*)

33

Expectation StepConsider Binet’s formula

(like Stirling’s):

Have:

Use a second-order Taylor series approximation taken about

as a function of : ijjkkk

ijkij nNpPNL ,,ˆ,ˆ,

~

~ )()()()(

ijkij nN )(~

2log1

1log!log

21

21

nN

nNnNnN

)(2)( ˆ,| , ˆ,| kij

kij nNEnNE 0 10 20 30 40 50

050

100

150

N-n

(*)

34

Maximization StepPortion of related to :

Portion of related to :

)(~ kQ

)(~ kQ

P

p

)1(

,

)()(

ˆ

log~~

k

i

jii

kij

kP

P

PNQ

by Lagrange multipliers

)1(

,

)()(

ˆ

log~

1log~

kij

jiijij

kijijij

kp

p

pnNpnQ

by Newton-Raphson iterations, with some parameterization ii DG ,

(*)

35

Convergence

0 500 1000 1500

-76

79

0-7

67

70

-76

75

0

EM Convergence with Criterion 1e-12 :1639 Iterations in 52 Seconds

EM Iteration

Exp

ect

ed

Lo

g L

ike

liho

od

Q

36

0.001 0.050 1.000

0.0

0.4

0.8

-/B

Dosage

Pro

b. o

f de

ath

0.001 0.050 1.000

0.0

0.4

0.8

-/H

DosageP

rob.

of

deat

h

0.001 0.050 1.000

0.0

0.4

0.8

-/A

Dosage

Pro

b. o

f de

ath

0.001 0.050 1.000

0.0

0.4

0.8

+/B

Dosage

Pro

b. o

f de

ath

0.001 0.050 1.000

0.0

0.4

0.8

+/H

Dosage

Pro

b. o

f de

ath

0.001 0.050 1.000

0.0

0.4

0.8

+/A

DosageP

rob.

of

deat

h

Dose Response Curves (log scale)

37

EM Results

test statistic for H0: no dosage

effect

separation of points …

38

Topics Used Here Calculus

Differentiation & Integration (including vector differentiation)

Lagrange MultipliersTaylor Series Expansions

Linear AlgebraDeterminants & EigenvaluesInverting [computationally/nearly singular] MatricesPositive Definiteness

ProbabilityDistributions: Multivariate Normal, Binomial, MultinomialBayes Formula

StatisticsLogistic RegressionSeparation of Points[Penalized] Likelihood MaximizationEM Algorithm

Biology – a little time and communication

Documents

1 John R. Stevens Utah State University Notes 2. Statistical Methods I Mathematics Educators Workshop 28 March 2009 1 Advanced Statistical Methods: Beyond