View
215
Download
1
Tags:
Embed Size (px)
Citation preview
1
John R. StevensUtah State University
Notes 2. Statistical Methods I
Mathematics Educators Workshop 28 March 2009
1
Advanced Statistical Methods:
Beyond Linear Regression
http://www.stat.usu.edu/~jrstevens/pcmi
2
What would your students know to do with these data?
Obs Flight Temp Damage1 STS1 66 NO2 STS9 70 NO3 STS51B 75 NO4 STS2 70 YES5 STS41B 57 YES6 STS51G 70 NO7 STS3 69 NO8 STS41C 63 YES9 STS51F 81 NO10 STS4 8011 STS41D 70 YES12 STS51I 76 NO13 STS5 68 NO14 STS41G 78 NO15 STS51J 79 NO16 STS6 67 NO17 STS51A 67 NO18 STS61A 75 YES19 STS7 72 NO20 STS51C 53 YES21 STS61B 76 NO22 STS8 73 NO23 STS51D 67 NO24 STS61C 58 YES
3
Two Sample t-test
data: Temp by Damage t = 3.1032, df = 21, p-value = 0.005383alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 2.774344 14.047085 sample estimates: mean in group NO mean in group YES 72.12500 63.71429
4
Does the t-test make sense here?Traditional:
Treatment Group mean vs. Control Group mean
What is the response variable?Temperature? [Quantitative, Continuous]Damage? [Qualitative]
5
Traditional Statistical Model 1Linear Regression: predict continuous response from
[quantitative] predictorsY=weight, X=heightY=income, X=education levelY=first-semester GPA, X=parent’s incomeY=temperature, X=damage (0=no, 1=yes)
Can also “control for” other [possibly categorical] factors (“covariates”):SexMajorState of OriginNumber of Siblings
6
Traditional Statistical Model 2Logistic Regression: predict binary response from
[quantitative] predictorsY=‘graduate within 5 years’=0 vs. Y=‘not’=1
X=first-semester GPAY=0 (no damage) vs. Y=1 (damage)
X=temperatureY=0 (survive) vs. Y=1 (death)
X=dosage (dose-response model)
Can also “control” for other factors, or “covariates”Race, SexGenotype
p = P(Y=1 | relevant factors) = prob. that Y=1, given state of relevant factors
7
Traditional Dose-Response Model
p = Probability of “death” at dose d:
Look at what affects the shape of the curve, LD50 (lethal dose for 50% efficacy), etc.
dp
p101
log
0.0 0.2 0.4 0.6 0.8 1.00.
00.
20.
40.
60.
81.
0
Dose-Response Curve
d
p
8
“Fitting” the Dose-Response Model
Why “logistic” regression?β0 = place-holder constantβ1 = effect of “dosage” dTo estimate parameters:
Newton-Raphson iterative process to “maximize the likelihood” of the model
Compare Y=0 (no damage) with Y=1 (damage) groups
dp
p101
log
9
Likelihood Function (to be maximized)
i
yi
yi
ii ppL
fp
pY
pY
Y
110
10
)1(,
...,
10Pr
1Pr
1,0
likelihood for obs. i
multiply probabilities (independence)
1010 ,log, Ll
10
Estimation by IRLSIteratively Reweighted Least Squares
equivalent: Newton-Raphson algorithm for iteratively solving “score” equations
0
, 10
j
l
11
Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 15.0429 7.3786 2.039 0.0415 *Temp -0.2322 0.1082 -2.145 0.0320 *---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
12
...ˆ p
13
What if the data were even “better”?Complete
separation of points
What should happen toour “slope” estimate?
14
Coefficients: Estimate Std. Error z value Pr(>|z|)(Intercept) 928.9 913821.4 0.001 1Temp -14.4 14106.7 -0.001 1
...ˆ p
15
Failure?Shape of likelihood function
Large Standard Errors
Solution only in 2006
Rather than maximizing likelihood, consider a penalty:
10
1010
ˆ,ˆ of variance"of magnitude"/5.
,,~
ll
16
Model fitted by Penalized MLConfidence intervals and p-values by Profile Likelihood
coef se(coef) Chisq p(Intercept) 30.4129282 16.5145441 11.35235 0.0007535240Temp -0.4832632 0.2528934 13.06178 0.0003013835
...ˆ p
17
Beetle Data
Phosphine Total Dosage Receiving Total Total Survivors Observed at Genotype (mg/L) Dosage Deaths Survivors -/B -/H -/A +/B +/H +/A 0 98 0 98 31 27 10 6 20 4 0.003 100 16 84 18 26 10 6 20 4 0.004 100 68 32 10 4 3 5 7 4 0.005 100 78 22 1 4 7 2 6 2 0.01 100 77 23 0 1 9 8 5 0 0.05 300 270 30 0 0 0 5 20 5 0.1 400 383 17 0 0 0 0 10 7 0.2 750 740 10 0 0 0 0 0 10 0.3 500 490 10 0 0 0 0 0 10 0.4 500 492 8 0 0 0 0 0 8 1.0 7850 7,806 44 0 0 0 0 0 44 10,798 10,420 378
18
Dose-response modelRecall simple model:
pij = Pr(Y=1 | dosage level j and genotype level i)
But – when is genotype (covariate Gi) observed?
jiiijij
ij dDGp
p
1log
dp
p101
log
19
Coefficients: Estimate Std. Error z value Pr(>|z|)(Intercept) -2.657e+01 8.901e+04 -2.98e-04 1dose -7.541e-26 1.596e+07 -4.72e-33 1G1+ -3.386e-28 1.064e+05 -3.18e-33 1G2B -1.344e-14 1.092e+05 -1.23e-19 1G2H -3.349e-28 1.095e+05 -3.06e-33 1dose:G1+ 7.541e-26 1.596e+07 4.72e-33 1dose:G2B 3.984e-12 3.075e+07 1.30e-19 1dose:G2H 7.754e-26 2.760e+07 2.81e-33 1G1+:G2B 1.344e-14 1.465e+05 9.17e-20 1G1+:G2H 3.395e-28 1.327e+05 2.56e-33 1dose:G1+:G2B -3.984e-12 3.098e+07 -1.29e-19 1dose:G1+:G2H -7.756e-26 2.763e+07 -2.81e-33 1
Before we “fix” this, first a little detour …
20
A Multivariate Gaussian Mixture
Component j isMVN(μj,Σj) with proportion πj
21
The Maximum Likelihood Approach
22
A Possible Work-Around
i jjjiij
nJ
ij
yZl
I
,|log,|
,...,
j groupin i obs.
11
Keys here:
1.the true group memberships Δ are unknown (latent)
2.statisticians specialize in unknown quantities
23
A reasonable approach
1. Randomly assign group memberships Δ, and estimate group means μj , covariance matrices Σj , and mixing proportions πj
2. Given those values, calculate (for each obs.) ξj = E[Δj|θ] = P(obs. in group j)
3. Update estimates for μj , Σj , and πj , weighting each observation by these ξ :
4. Repeat steps 2 and 3 to convergence
iij
iiij
j
y
24 Plotting character and color indicate most likely component
25
The EM (Baum-Welch) Algorithm- maximization made easier with Zm = latent (unobserved) data; T = (Z,Zm) = complete data
1. Start with initial guesses for parameters2. Expectation: At the kth iteration,
compute
3. Maximization: Obtain estimate
by maximizing over
4. Iterate steps 2 and 3 to convergence ($?)
)0(
)(')( ˆ,||ˆ, kk ZTlEQ
)1(ˆ k
)(ˆ, kQ
26
Beetle Data – NotationObserved values
Unobserved (latent) values If Nij had been observed:
How Nij can be [latently] considered:
i genotype withj dosageat survivors #
j dosage receiving #
ij
j
n
N
i genotype withj dosage receiving # ijN
i genotypefor j dosageat death of Prob.
1,~
ij
ijijij
p
pNBinomialn
i genotype withpopulation of prop.
,~),,( 61
i
jjjj
P
PNlMultinomiaNNN
27
Likelihood FunctionParameters θ=(p,P) and complete data T=(n,N)
After simplification:
Mechanism of missing data suggests EM algorithm
}]log1log
!log!loglog{!log[
,|,log|
ijijijijij
ijijijiiji
jj
pnNpn
nNnPNN
PpNnfTl
28
Missing at Random (MAR)
Necessary assumption for usual EM applications
Covariate x is MAR if probability of observing x does not depend on x or any other unobserved covariate, but may depend on response and other observed covariates (Ibrahim 1990)
Here – genotype is observed only for survivors, and for all subjects at zero dosage
29
Initialization StepTwo classes of marginal information here
For all dosage levels j – observeAt zero dosage level – observe for genotype iAllows estimate of Pi Consider marginal distn. of missing categorical
covariate (genotype)Using zero dosage level:
This is the key – the marginal distribution of the missing categorical covariate
5.0ˆ , ˆ )0(
0
0,)0(
iji
i pN
NP
jN
0,iN
30
Expectation StepDropping “constants” and
:
Need to evaluate:
!log !log ijj nN
ijijkijijij
kiji
kij
ji
k
pnNpn
LPNQ
log~
1log
log~
{~
)(
)()(
,
)(
)()()()( ˆ,|!log , ˆ,|~ k
ijijkij
kij
kij nnNELnNEN
(*)
31
Expectation Step
Bayes Formula:
Multinomial
jNjjj
jjjjj NfNnf
NfNnfnNh
|
|,|
ji
ijjjjj nNlMultinomiannN ,~,|
lljl
ijiij pP
pP
ijl
ljj
l
klj
kl
kij
kik
ij nnNpP
pPN
)()(
)()()(
ˆˆˆˆ~
(*)
32
Expectation Step
For :Not needed for maximization
– only affects EM convergence rateDirect calculation from multinomial distn. is
“possible” – but computationally prohibitiveNeed to employ some approximation strategy
Second-order Taylor series about , using Binet’s formula
)()( ˆ,|!log kijij
kij nnNEL
ijkij nN )(~
2log11log!log 21
21 nNnNnNnN
(*)
33
Expectation StepConsider Binet’s formula
(like Stirling’s):
Have:
Use a second-order Taylor series approximation taken about
as a function of : ijjkkk
ijkij nNpPNL ,,ˆ,ˆ,
~
~ )()()()(
ijkij nN )(~
2log1
1log!log
21
21
nN
nNnNnN
)(2)( ˆ,| , ˆ,| kij
kij nNEnNE 0 10 20 30 40 50
050
100
150
N-n
(*)
34
Maximization StepPortion of related to :
Portion of related to :
)(~ kQ
)(~ kQ
P
p
)1(
,
)()(
ˆ
log~~
k
i
jii
kij
kP
P
PNQ
by Lagrange multipliers
)1(
,
)()(
ˆ
log~
1log~
kij
jiijij
kijijij
kp
p
pnNpnQ
by Newton-Raphson iterations, with some parameterization ii DG ,
(*)
35
Convergence
0 500 1000 1500
-76
79
0-7
67
70
-76
75
0
EM Convergence with Criterion 1e-12 :1639 Iterations in 52 Seconds
EM Iteration
Exp
ect
ed
Lo
g L
ike
liho
od
Q
36
0.001 0.050 1.000
0.0
0.4
0.8
-/B
Dosage
Pro
b. o
f de
ath
0.001 0.050 1.000
0.0
0.4
0.8
-/H
DosageP
rob.
of
deat
h
0.001 0.050 1.000
0.0
0.4
0.8
-/A
Dosage
Pro
b. o
f de
ath
0.001 0.050 1.000
0.0
0.4
0.8
+/B
Dosage
Pro
b. o
f de
ath
0.001 0.050 1.000
0.0
0.4
0.8
+/H
Dosage
Pro
b. o
f de
ath
0.001 0.050 1.000
0.0
0.4
0.8
+/A
DosageP
rob.
of
deat
h
Dose Response Curves (log scale)
37
EM Results
test statistic for H0: no dosage
effect
separation of points …
38
Topics Used Here Calculus
Differentiation & Integration (including vector differentiation)
Lagrange MultipliersTaylor Series Expansions
Linear AlgebraDeterminants & EigenvaluesInverting [computationally/nearly singular] MatricesPositive Definiteness
ProbabilityDistributions: Multivariate Normal, Binomial, MultinomialBayes Formula
StatisticsLogistic RegressionSeparation of Points[Penalized] Likelihood MaximizationEM Algorithm
Biology – a little time and communication