Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
AdaPT: A New Class of Ordered TestingProcedures
Lihua Lei and William Fithian
Department of Statistics, UC Berkeley
JSM 2016, Chicago
Lihua Lei and William Fithian AdaPT: A New Class of Ordered Testing Procedures
Table of Contents
1 Setup
2 Existing Methods
3 Adaptive P-value Thresholding (AdaPT)
Lihua Lei and William Fithian AdaPT: A New Class of Ordered Testing Procedures
Table of Contents
1 Setup
2 Existing Methods
3 Adaptive P-value Thresholding (AdaPT)
Lihua Lei and William Fithian AdaPT: A New Class of Ordered Testing Procedures
Multiple Testing Problem with FDR Control
General setup: a sequence of hypotheses H1,H2, . . . ,Hn;
H0 = {i : Hi is true} be the set of null hypotheses;S = {i : Hi is rejected} be the set of discoveries;FDP = VR∨1 be the False Discovery Proportion with V = |S|and R = |S ∩ H0|;FDR = EFDP be the False Discovery Rate, the target that aprocedure should control.
A procedure that control FDR at level 0.1 produces arejection set S with roughly 90% being the true discoveries.
Lihua Lei and William Fithian AdaPT: A New Class of Ordered Testing Procedures
Ordered Hypothesis Testing
Domain knowledge might be used to indicate whichhypothesis is more “promising”, i.e. likely to be rejected;
Heuristically, more focus should be put on “promising”hypotheses;
Sort H1, . . . ,Hn from most “promising” to least “promising”via the prior knowledge;
A procedure that takes advantage of the ordering is called anordered hypothesis testing procedure.
Lihua Lei and William Fithian AdaPT: A New Class of Ordered Testing Procedures
Example: GEOquery Data
GEOquery data1[LB15] consists of gene expressionmeasurements in response to estrogen in breast cancer cells;
Consists of n = 22283 genes and two groups (a treatmentgroup and a control group) with 5 trials in each;
Test Hi : F0i = F1i , where F0i and F1i are the distributions ofgene expression of gene i in the control group and thetreatment group, respectively;
H1, . . . ,Hn are ordered by auxiliary data.
1http://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS2324
Lihua Lei and William Fithian AdaPT: A New Class of Ordered Testing Procedures
Example: GEOquery Data
Original ordering
Target FDR level α
# of
dis
cove
ries
0.1 0.2 0.3
040
080
0
● ●
●
●
● ● ●
●
● ● ● ●● ● ● ●
●
●
●
●
SeqStepAccum. TestForwardStopAdaptive SeqStepBHStoreySABHAAdaPT
Moderately informative ordering
Target FDR level α
# of
dis
cove
ries
0.1 0.2 0.3
020
0040
00
●
●
●
●
● ● ●
●●●
●
●
● ●
●
●
Highly informative ordering
Target FDR level α
# of
dis
cove
ries
0.1 0.2 0.3
020
0040
00
●
●
●
●
● ● ●
●●
●
●
●
●●
●
●
Lihua Lei and William Fithian AdaPT: A New Class of Ordered Testing Procedures
Table of Contents
1 Setup
2 Existing Methods
3 Adaptive P-value Thresholding (AdaPT)
Lihua Lei and William Fithian AdaPT: A New Class of Ordered Testing Procedures
Existing Methods Revisited: Accumulation Test
F̂DPAT =C +
∑ki=1 h(pi )
k + 1
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
Index
pval
s
Accumulation Test
1 k n
0s
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
h ∈ [0,C ],∫ 10 h(x)dx = 1;
Find the maximum k suchthat F̂DPAT ≤ q;ForwardStop[GWCT15]:
h(x) = − log(1− x);
Seqstep[BC15]:
h(x) =I (x > λ)
1− λ;
HingeExp[LB15]:
h(x) = − I (x > λ)1− λ
log(1− x1− λ
).
Lihua Lei and William Fithian AdaPT: A New Class of Ordered Testing Procedures
Existing Methods Revisited: Selective Seqstep
F̂DPSS =ks
R(k ; s) ∨ 1·A(k ; s) + 1k(1− s)
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
Index
pval
s
Selective Seqstep
1 k n
0s
1
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
R(k ; s) = |{i ≤ k : pi ≤ s}|;A(k; s) = |{i ≤ k : pi > s}|;s is pre-fixed;
Find the maximum k suchthat F̂DPSS ≤ q.Turns out that the blue termshould be an approximationof π0,k ,
π0,k =|{1, . . . , k} ∩ H0|
k;
Too conservative for small s.
Lihua Lei and William Fithian AdaPT: A New Class of Ordered Testing Procedures
Existing Methods Revisited: Adaptive Seqstep
F̂DPAS =ks
R(k ; s) ∨ 1·A(k ;λ) + 1k(1− λ)
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
Index
pval
s
Adaptive Seqstep
1 k n
0s
lam
1
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
R(k ; s) = |{i ≤ k : pi ≤ s}|;A(k;λ) = |{i ≤ k : pi > λ}|;s and λ are is pre-fixed;
Find the maximum k suchthat F̂DPAS ≤ q;Much less conservative if alarge λ, say 0.5, is used.
Lihua Lei and William Fithian AdaPT: A New Class of Ordered Testing Procedures
Table of Contents
1 Setup
2 Existing Methods
3 Adaptive P-value Thresholding (AdaPT)
Lihua Lei and William Fithian AdaPT: A New Class of Ordered Testing Procedures
AdaPT
F̂DPAdaPT =A(k;λ) + 1
R(k ; s) ∨ 1
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
Index
pval
s
AdaPT (Step 0)
1 n
0w
1−w
1
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
R(k;w) = |{i : pi ≤ wi}|;A(k ;w) = |{i : pi > 1− wi}|;
Estimate next w by p̃i :
p̃i =
{pi (pi is black)NA (pi is red or blue)
Repeat until F̂DPAdaPT ≤ q.
Lihua Lei and William Fithian AdaPT: A New Class of Ordered Testing Procedures
AdaPT
F̂DPAdaPT =A(k;λ) + 1
R(k ; s) ∨ 1
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
Index
pval
s
AdaPT (Step 1)
1 n
0w
1−w
1
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
R(k;w) = |{i : pi ≤ wi}|;A(k ;w) = |{i : pi > 1− wi}|;Estimate next w by p̃i :
p̃i =
{pi (pi is black)NA (pi is red or blue)
Repeat until F̂DPAdaPT ≤ q.
Lihua Lei and William Fithian AdaPT: A New Class of Ordered Testing Procedures
AdaPT
F̂DPAdaPT =A(k;λ) + 1
R(k ; s) ∨ 1
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
Index
pval
s
AdaPT (Step 2)
1 n
0w
1−w
1
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
R(k;w) = |{i : pi ≤ wi}|;A(k ;w) = |{i : pi > 1− wi}|;Estimate next w by p̃i :
p̃i =
{pi (pi is black)NA (pi is red or blue)
Repeat until F̂DPAdaPT ≤ q.
Lihua Lei and William Fithian AdaPT: A New Class of Ordered Testing Procedures
AdaPT
F̂DPAdaPT =A(k;λ) + 1
R(k ; s) ∨ 1
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
Index
pval
s
AdaPT (Step 3)
1 n
0w
1−w
1
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
R(k;w) = |{i : pi ≤ wi}|;A(k ;w) = |{i : pi > 1− wi}|;Estimate next w by p̃i :
p̃i =
{pi (pi is black)NA (pi is red or blue)
Repeat until F̂DPAdaPT ≤ q.
Lihua Lei and William Fithian AdaPT: A New Class of Ordered Testing Procedures
AdaPT
F̂DPAdaPT =A(k;λ) + 1
R(k ; s) ∨ 1
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
Index
pval
s
AdaPT (Step 4)
1 n
0w
1−w
1
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
R(k;w) = |{i : pi ≤ wi}|;A(k ;w) = |{i : pi > 1− wi}|;Estimate next w by p̃i :
p̃i =
{pi (pi is black)NA (pi is red or blue)
Repeat until F̂DPAdaPT ≤ q.
Lihua Lei and William Fithian AdaPT: A New Class of Ordered Testing Procedures
Theorem 1.
Assume that
1 {pi : i = 1, . . . , n} are independent;2 {pi : i ∈ H0} are i.i.d. uniformly distributed on U[0, 1].
Then AdaPT controls FDR at level q.
Any method to update w guarantees the FDR control!
Lihua Lei and William Fithian AdaPT: A New Class of Ordered Testing Procedures
Theorem 1.
Assume that
1 {pi : i = 1, . . . , n} are independent;2 {pi : i ∈ H0} are i.i.d. uniformly distributed on U[0, 1].
Then AdaPT controls FDR at level q.
Any method to update w guarantees the FDR control!
Lihua Lei and William Fithian AdaPT: A New Class of Ordered Testing Procedures
Real Example: GEOquery Data
Original ordering
Target FDR level α
# of
dis
cove
ries
0.1 0.2 0.3
040
080
0
● ●
●
●
● ● ●
●
● ● ● ●● ● ● ●
●
●
●
●
SeqStepAccum. TestForwardStopAdaptive SeqStepBHStoreySABHAAdaPT
Moderately informative ordering
Target FDR level α
# of
dis
cove
ries
0.1 0.2 0.3
020
0040
00
●
●
●
●
● ● ●
●●●
●
●
● ●
●
●
Highly informative ordering
Target FDR level α
# of
dis
cove
ries
0.1 0.2 0.3
020
0040
00
●
●
●
●
● ● ●
●●
●
●
●
●●
●
●
Lihua Lei and William Fithian AdaPT: A New Class of Ordered Testing Procedures
References
Rina Foygel Barber and Emmanuel J Candès.Controlling the false discovery rate via knockoffs.The Annals of Statistics, 43(5):2055–2085, 2015.
Max Grazier G’Sell, Stefan Wager, Alexandra Chouldechova, andRobert Tibshirani.Sequential selection procedures and false discovery rate control.Journal of the Royal Statistical Society: Series B (StatisticalMethodology), 2015.
Ang Li and Rina Foygel Barber.Accumulation tests for fdr control in ordered hypothesis testing.arXiv preprint arXiv:1505.07352, 2015.
Lihua Lei and William Fithian AdaPT: A New Class of Ordered Testing Procedures
THANK YOU!
Lihua Lei and William Fithian AdaPT: A New Class of Ordered Testing Procedures
SetupExisting MethodsAdaptive P-value Thresholding (AdaPT)