Upload
cameron-malone
View
212
Download
0
Embed Size (px)
Citation preview
i) AIC, ORIC and New method use information criteria and select the models with the largest adjusted log-likelihood.
ii) MCT defines different contrasts for all elementary alternative models and test all of them in a multiple contrast test; after rejecting the null for at least one alternatives, select the one with largest test statistics.
Model Selection under Change Point Order RestrictionXuefei Mi, L.A. Hothorn
Institute of Biostatistics, Leibniz University Hannover, Germany
1. Change-point detectionExample
Objectives
to control the familywise error rate over all k-1 alternative models
ii) When the null is rejected at level α, select one of the elementary
model
3. Competing approaches
i) Common AIC (Akaike, 1973)
ii) Order restricted information criteria (Anraku, 1999)
iii) Multiple Contrast Test (MCT) ( Bretz and Hothorn, 2002)The test statistics under different alternatives is:
iv) Non-parametric idea (Xiong and Barmi, 2002) Similar idea, but calculate the penalty by simulation
6. Conclusionsi) Model selection approaches for some ordered alternatives modified in such a way that it controls α.ii) Global decision AND decision in favor of a particular elementary alternative model
Email: [email protected]
References:Anraku, K. An information criterion for parameters under a simple order restriction. Biometrika, 1999;86: 141-152
Akaike,H. Information theory and an extension of maximum likelihood principle. Second International Symposium on
Information. Theory Akademia Kiado, 1973:267-281
Bretz, F and Hothorn, L.A. Detecting dose response using contrasts: asymptotic power and sample size determination for binomial data. Statist. Med., 2002;21:3325
Ninomiya, Y Information criterion for Gaussian change-point model Stat. Probability Lett., 2005;72: 237-247
Ninomiya, Y Personal communication 2006
2. Decompose the global alternative into all elementary ones
The maximum likelihood estimator under order restriction for different elementary alternatives, calculated by pool-adjacent-violators algorithm (Robertson, 1988)
The estimated information are used to identify the “true” model, calculated from the log-likelihood of estimators.
Kullback-Leibler Distance (Anraku, 1999)
The constant term is omitted. The model, which has the largest (-KL) distance, is selected as the most possible model.
The distribution of the log-likelihood (Robertson, 1988)
Our new penalty term
5. Epedemic alternatives: Two change-points In DNA motif finding
is assumed to be binomial distributed Epidemic alternative
Approximately penalty term for the alternatives (Ninomiya, 2005) penalty= 2+3m m is the number of change-pointsE.g. for symmetric motif
4. Local decision: Model selection controllingα
Evaluation of the example:
Anraku method is a sensitive one to detect the change-point, but the over estimate problem is discussed by Roberts(2006).
Hypothesis 0.49 0.49 0.49
0.45 0.51 0.51
0.445 0.445 0.58
2100under p p : pH
2101under p p : pH A
2102under p p : pH A
0p̂ 1p̂ 2p̂
))(ˆ())(ˆ(logconstant
))(ˆ( pansionTaylor Ex))(ˆ(logconstant ))(ˆ ),((-
nInformatio Estimated nInformatio True ))(ˆ ),(( constant
xgxgL
xgxgLxgxgI
xgxgI
Penalty
))(,,(2
1
)|)(ˆ(log)|)(ˆ(log
1prob. level
21
0
l
i
jAidf
jA
Hlip
HxgLHxgL
2*))( ,,(2
11)|)(ˆ(Penalty:under
1)|)(ˆ(Penalty:under
1
295.0,1
00
kHlipHxgH
HxgHl
i
jAidf
jA
jA
l
i
jA
jA
jA HlipiHxgH
1
))( ,,(*)|)(ˆ(Penalty:under
})ˆ1(ˆ{2
i
ij
i
ijj n
cppx
n
cT
Motif is bonding site for proteins
• actgctACTgcacAATTgcgaattctagtcg…tcaaatgc
GeneMotif 5-30bp
DNA-binding proteins RNA polymerase (protein)
},,,{ :Alphabet
1414141414141414141414141414141414
max141412867106889697131414
000822102811110000
14021621605369413140
01412357350816331014
00021301609117000
14
1ˆ
motif aligned 14 theof Matrix
TGCAAl
total
T
G
C
A
......:
...:
10
1
1
100
kjj
k
jA
k
ppppH
pppH
58.044.045.0ˆ
414320Total
172411Absent
24199Presence
0.1125.0PlaceboTreatment
ip
)1,...,1( .........:
...:
110
100
ki, jppppppH
pppH
kjjiiA
k
.matrix ncorrelatio withddistribute normal
standard variate-qallly asymptotic is },...,,{ ,Here 21
R
TTT q
92.292.21
5.15.11
221
210
New
Anraku
AIC
HHH AA
......... 151312430 pppppp
New vs. MCT- Higher power than MCT- Simpler and faster- MCT provides confidence intervals
New vs. Anraku- Controls the α rate - Does not over-estimate
Robertson, T., Wright, F.T. and Dykstra, R.L. Order restricted statistical inference. Wiley, New York. 1988.
Roberts, S. and Martin, M.A. The question of nonlinearity in the dose-response relation between particulate matter air pollution and mortality: can Akaike’s Information Criterion be trusted to take the right turn? American Journal of Epidemiology 2006;164:No. 12
Stormo G, Schneider T, Gold L, Ehrenfeucht A. Use of the ’perception algorithm to distinguish translational initiation sites in Escherichia coli Nucleic Acids Res, 1982;10:2997-3011
VanZwet E. Kechris, KJ. Bickel, PJ. et al. Estimating motifs under order restrictions. Statistical Applications in Genetics and Molecular Biology, 2005;4:1Xiong, C. And Barmi, H. On detecting change in likelihood ratio ordering. Nonparametric Statistics 2002;14: 555-568
Zhu J. and Zhang M. A Promoter database of yeast Saccharomyces cerevisiae. Bioinformatics 1999; 15:607-611.
:globally against Reject i) 0 AHH
kjjjA ppppH ......: 10
Dose finding study with an adverse events rate by Bretz and Hothorn (2002)
Special case of order restriction: one change-point
ty.multiplici control toused is2 .correlatedhighly are esalternativ different
of distances The ).|)(ˆ(log)|)(ˆ(log of ondistributi theusingby
level theholds which, for penalty different a develops methodnew This
0
k-
HxgLHxgL
HjA
A
20
0
210
7.41-8.27- 7.91- Anraku
8.86-9.72-7.91-NEW
)36.0 ,08.0()33.0 ,20.0(13.006.0MCT
modelBest CI CIMethods 21
A
HHAA
H
H
H
HHHAA
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35
0.0
0.2
0.4
0.6
0.8
1.0
New method with pattern p1 0.3 p2 0.3 p3 0.3
Delta
Co
rre
ct m
od
el s
ele
ct r
ate
0.95
Model - Sample size
H0 25
HA1 25
HA2 25
H0 50
HA1 50
HA
2 50
H0 100
HA1 100
HA2 100
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35
0.0
0.2
0.4
0.6
0.8
1.0
MCT with pattern p1 0.3 p2 0.3 p3 0.3
Delta
Co
rre
ct m
od
el s
ele
ct r
ate
Model - Sample size
H0 25
HA1 25
HA
2 25
H0 50
HA1 50
HA2 50
H0 100
HA
1 100
HA2 100
0.0 0.1 0.2 0.3 0.4
0.0
0.2
0.4
0.6
0.8
1.0
Ninomiya penalty
Delta
Cor
rect
mod
el s
elec
t rat
e
Samplesize -p1
30-0.8030-0.9014-0.8014-0.90
0.0 0.1 0.2 0.3 0.4
0.0
0.2
0.4
0.6
0.8
1.0
New penlaty
Delta
Cor
rect
mod
el s
elec
t rat
e
Samplesize -p1
30-0.8030-0.9014-0.8014-0.90
0.95
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35
0.0
0.2
0.4
0.6
0.8
1.0
Anraku with pattern p1 0.3 p2 0.3 p3 0.3
Delta
Co
rre
ct m
od
el s
ele
ct r
ate
Model - Sample size
H0 25
HA1 25
HA
2 25
H0 50
HA1 50
HA2 50
H0 100
HA
1 100
HA2 100
},,,{ TGCA