View
15
Download
0
Category
Preview:
Citation preview
Controlling false discovery ratevia knockoffs
Rina Foygel Barber & Emmanuel Candès
Code & demos available at http://web.stanford.edu/~candes/Knockoffs/Paper available at http://arxiv.org/abs/1404.5609
Jan 21 2015
Jan 21 2015 Controlling false discovery rate via knockoffs 1/36
Setting
An example:Which mutations in the reverse transcriptase (RT) of HIV-1determine susceptibility to reverse transcriptase inhibitors (RTIs)?
I yi ∈ R = resistance of virus in sample i to a RTI-type drugI Xij ∈ {0, 1} indicates if mutation j is present in virus sample i
How can we select mutations that determine drug resistance,in such a way that our answer will replicate in further trials?
Jan 21 2015 Controlling false discovery rate via knockoffs 2/36
Setting
Sparse linear model:
y = X · β + z, where ziiid∼ N (0, σ2)
I n observations, p featuresI β is sparse
Jan 21 2015 Controlling false discovery rate via knockoffs 3/36
Setting
Goal: select a set of features Xj
that are likely to be relevant to the response y,without too many false positives.
One way to measure performance:
FDR = E[
# false positivestotal # of features selected
]= E
[|S ∩H0||S|
].
↑ ︸ ︷︷ ︸False discovery rate False discovery proportion
S = set of selected featuresH0 = “null hypotheses” = {j : β?
j = 0}
Jan 21 2015 Controlling false discovery rate via knockoffs 4/36
Sparse regression
Lasso: βλ = arg minβ∈Rp
{12‖y− X · β‖2
2 + λ ‖β‖1
}
Asymptotically, Lasso will select the correct model (at a good λ).
In practice for a finite sample,I True positives & false positives intermixed along the Lasso pathI How to pick λ to balance FDR vs power?I Need to account for correlations between Xj & weak signals that
may have been missed on the Lasso path.
Jan 21 2015 Controlling false discovery rate via knockoffs 5/36
Sparse regressionSimulated data with n = 1500, p = 500.
Lasso fitted model for λ = 1.75:
●●●●●●●●●
●
●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●
●●●●●●●
●
●●●●●●●●●●●
●
●●●●●
●
●●●●●●●●
●
●●●●●●●●
●
●●●●●●●●●●●
●
●●
●
●●●●●●●●●●●●●●
●
●●●●●●●●●●
●
●●●
●
●●●●
●
●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●
●
●●●●●●●●
●
●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●
●●●●●●●●●●
●
●●●●
●●
●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●
●
●●●●●●●●●●●●●●●●
●
●●●●
●
●●●●●●●●●
●
●●●●●●●●●●
●
●●●●●●●●●●●
●
●●●●●●●●
●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●
●
●●●●●●●●●●●●●●
●
●
●●●●●●●●●●
●
●●●●
●
●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●
●
●●●●●●
0 100 200 300 400 500
−2
−1
01
2
Index j
Fitt
ed c
oeffi
cien
t βj
Jan 21 2015 Controlling false discovery rate via knockoffs 6/36
Sparse regressionSimulated data with n = 1500, p = 500.
Lasso fitted model for λ = 1.75:
●●●●●●●●●
●
●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●
●●●●●●●
●
●●●●●●●●●●●
●
●●●●●
●
●●●●●●●●
●
●●●●●●●●
●
●●●●●●●●●●●
●
●●
●
●●●●●●●●●●●●●●
●
●●●●●●●●●●
●
●●●
●
●●●●
●
●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●
●
●●●●●●●●
●
●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●
●●●●●●●●●●
●
●●●●
●●
●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●
●
●●●●●●●●●●●●●●●●
●
●●●●
●
●●●●●●●●●
●
●●●●●●●●●●
●
●●●●●●●●●●●
●
●●●●●●●●
●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●
●
●●●●●●●●●●●●●●
●
●
●●●●●●●●●●
●
●●●●
●
●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●
●
●●●●●●
0 100 200 300 400 500
−2
−1
01
2
Index j
Fitt
ed c
oeffi
cien
t βj
o
o False positive?
True positive?
Jan 21 2015 Controlling false discovery rate via knockoffs 7/36
Sparse regressionSimulated data with n = 1500, p = 500.
Lasso fitted model for λ = 1.75:
●●●●●●●●●
●
●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●
●●●●●●●
●
●●●●●●●●●●●
●
●●●●●
●
●●●●●●●●
●
●●●●●●●●
●
●●●●●●●●●●●
●
●●
●
●●●●●●●●●●●●●●
●
●●●●●●●●●●
●
●●●
●
●●●●
●
●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●
●
●●●●●●●●
●
●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●
●●●●●●●●●●
●
●●●●
●●
●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●
●
●●●●●●●●●●●●●●●●
●
●●●●
●
●●●●●●●●●
●
●●●●●●●●●●
●
●●●●●●●●●●●
●
●●●●●●●●
●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●
●
●●●●●●●●●●●●●●
●
●
●●●●●●●●●●
●
●●●●
●
●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●
●
●●●●●●
0 100 200 300 400 500
−2
−1
01
2
Index j
Fitt
ed c
oeffi
cien
t βj
FDP = 2655 = 47%
To estimate FDP, would need to calculate distribution of βλj for null j(would need to know σ2, β?, . . . ). (Donoho et al 2009)
Jan 21 2015 Controlling false discovery rate via knockoffs 8/36
Sparse regressionSimulated data with n = 1500, p = 500.
Lasso fitted model for λ = 1.75:
●●●●●●●●●
●
●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●
●●●●●●●
●
●●●●●●●●●●●
●
●●●●●
●
●●●●●●●●
●
●●●●●●●●
●
●●●●●●●●●●●
●
●●
●
●●●●●●●●●●●●●●
●
●●●●●●●●●●
●
●●●
●
●●●●
●
●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●
●
●●●●●●●●
●
●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●
●●●●●●●●●●
●
●●●●
●●
●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●
●
●●●●●●●●●●●●●●●●
●
●●●●
●
●●●●●●●●●
●
●●●●●●●●●●
●
●●●●●●●●●●●
●
●●●●●●●●
●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●
●
●●●●●●●●●●●●●●
●
●
●●●●●●●●●●
●
●●●●
●
●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●
●
●●●●●●
0 100 200 300 400 500
−2
−1
01
2
Index j
Fitt
ed c
oeffi
cien
t βj
FDP = 2655 = 47%
To estimate FDP, would need to calculate distribution of βλj for null j(would need to know σ2, β?, . . . ). (Donoho et al 2009)
Jan 21 2015 Controlling false discovery rate via knockoffs 8/36
Construct knockoffs
Main idea:
For each feature Xj, construct a knockoff version X̃j.The knockoffs serve as a “control group”⇒ can estimate FDP.
Setting:I Require n > p (ongoing work for high-dim. setting)I Don’t need to know σ2
I Don’t need any information about β?
I Will get an exact, finite-sample guarantee for FDR
Jan 21 2015 Controlling false discovery rate via knockoffs 9/36
Construct knockoffs
Construction:
I The knockoffs replicate the correlation structure of X:
X̃>j X̃k = X>j Xk for all j, k
I Also preserve correlations between knockoffs & originals:
X̃>j Xk = X>j Xk for all j 6= k
Augmented design matrix[X X̃
]= (X1 X2 . . .Xp X̃1 X̃2 . . . X̃p) ∈ Rn×2p
Jan 21 2015 Controlling false discovery rate via knockoffs 10/36
Construct knockoffs
How?
Define X̃ = X · (Ip − 2ξΣ−1) + U · C, where:
Σ = X>X � ξIp
U = n× p orthonormal matrix orthogonal to X
C>C = 4(ξIp − ξ2Σ−1) (Cholesky decomposition)
=⇒[X X̃
]>[X X̃]
=
(Σ Σ− 2ξIp
Σ− 2ξIp Σ
)
Jan 21 2015 Controlling false discovery rate via knockoffs 11/36
Construct knockoffs
Why?
For a null feature Xj,
X>j y = X>j Xβ? + X>j z D= X̃>j Xβ? + X̃>j z = X̃>j y
Jan 21 2015 Controlling false discovery rate via knockoffs 12/36
Construct knockoffs
Why?
For a null feature Xj,
X>j y = X>j Xβ? + X>j z D= X̃>j Xβ? + X̃>j z = X̃>j y
Jan 21 2015 Controlling false discovery rate via knockoffs 12/36
Construct knockoffs
Lemma 1: Pairwise exchangeability property.
For any N ⊂ H0,([X X̃
]swap(N)
)>y D
=[X X̃
]>y
=⇒ the knockoffs are a “control group” for the nulls
[X X̃
]swap(N)
=
Jan 21 2015 Controlling false discovery rate via knockoffs 13/36
Knockoff method
Steps:
1. Construct knockoffs
2. Compute Lasso with augmented matrix:
βλ = arg minβ∈R2p
{12
∥∥∥y−[X X̃
]· β∥∥∥2
2+ λ ‖β‖1
}
3. Use X̃j as a “control group” for Xj
Jan 21 2015 Controlling false discovery rate via knockoffs 14/36
Knockoff method
Fitted model for λ = 1.75 on the simulated dataset:
●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●
●
●●●●●
●
●●●●●●●●
●
●●●●●●●●
●
●●●●●●●●●●●
●
●●
●
●●●●●●●●●●●●●●
●
●●●●●●●●●●
●
●●●
●
●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●
●
●●●●●●●●
●
●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●
●●●●●●●●●●
●
●●●●
●●
●
●
●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●
●
●●●●
●
●●●●●●●●●
●
●●●●●●●
●
●●
●
●●●●●●●●●●●
●
●●●●●●●●
●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●
●
●●●●●●●●●●●●●●
●
●
●●●●●●●●●●
●
●●●●
●
●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●
●
●●●●●●●
●
●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●
●
●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●
●
●
●●●●●
500 original features 500 knockoff features
Fitt
ed c
oeffi
cien
t βj
I Lasso selects 49 original features & 24 knockoff features
I Pairwise exchangeability of the nulls=⇒ probably ≈ 24 false positives among the 49 original features
Jan 21 2015 Controlling false discovery rate via knockoffs 15/36
Knockoff method
Fitted model for λ = 1.75 on the simulated dataset:
●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●
●
●●●●●
●
●●●●●●●●
●
●●●●●●●●
●
●●●●●●●●●●●
●
●●
●
●●●●●●●●●●●●●●
●
●●●●●●●●●●
●
●●●
●
●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●
●
●●●●●●●●
●
●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●
●●●●●●●●●●
●
●●●●
●●
●
●
●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●
●
●●●●
●
●●●●●●●●●
●
●●●●●●●
●
●●
●
●●●●●●●●●●●
●
●●●●●●●●
●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●
●
●●●●●●●●●●●●●●
●
●
●●●●●●●●●●
●
●●●●
●
●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●
●
●●●●●●●
●
●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●
●
●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●
●
●
●●●●●
500 original features 500 knockoff features
Fitt
ed c
oeffi
cien
t βj
I Lasso selects 49 original features & 24 knockoff featuresI Pairwise exchangeability of the nulls
=⇒ probably ≈ 24 false positives among the 49 original features
Jan 21 2015 Controlling false discovery rate via knockoffs 15/36
Knockoff method
Compute Lasso on the entire path λ ∈ [0,∞).
λj = sup{λ : βλj 6= 0
}= first time Xj enters Lasso path
λ̃j = sup{λ : β̃λj 6= 0
}= first time X̃j enters Lasso path
Then define statistics
Wj = max{λj, λ̃j} · sign(λj − λ̃j)
Jan 21 2015 Controlling false discovery rate via knockoffs 16/36
Knockoff method
●●●●●●●●●
++
−
+
−−
+
−
++
−
++|W|
● nullsignal
︸ ︷︷ ︸ ︸ ︷︷ ︸λ = 0 λ→∞
variables enter late variables enter early(probably not significant) (likely significant)
Jan 21 2015 Controlling false discovery rate via knockoffs 17/36
Knockoff method
0 1 2 3 4
0
1
2
3
4
λ when Xj enters
λ w
hen
Xj e
nter
s
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●●●
●
●●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●●
●
● ●
●
●●
●
●
●●
●
●●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●
●
●
●
●
● ●
●
●●
●●
●●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●●
●●
●
●
●
●●
●●
●
●
●
●
●
● NullSignal
Jan 21 2015 Controlling false discovery rate via knockoffs 18/36
Knockoff method
Lemma 2: Pairwise exchangeability of the nulls =⇒
(W1,W2, . . . ,Wp)D= (|W1| · ε1, |W2| · ε2, . . . , |Wp| · εp)
where εj = sign(Wj) for non-nulls and εjiid∼ {±1} for nulls.
●●●●●●●●●
++
−
+
−−
+
−
++
−
++|W|
● nullsignal
Jan 21 2015 Controlling false discovery rate via knockoffs 19/36
Knockoff method
Selected variables: Sλ = {j : Wj ≥ +λ}Control group: S̃λ = {j : Wj ≤ −λ}
F̂DP(Sλ) :=
∣∣S̃λ∣∣∣∣Sλ∣∣
0 1 2 3 4
0
1
2
3
4
λ when Xj enters
λ w
hen
Xj e
nter
s
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●●●
●
●●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●●
●
● ●
●
●●
●
●
●●
●
●●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●
●
●
●
●
● ●
●
●●
●●
●●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●●
●●
●
●
●
●●
●●
●
●
●
●
●
● NullSignal
selected variables Sλ
control group Sλ
FDP(Sλ) =
∣∣Sλ ∩H0∣∣∣∣Sλ∣∣ ≈
∣∣S̃λ ∩H0∣∣∣∣Sλ∣∣ ≤ F̂DP(Sλ)
Jan 21 2015 Controlling false discovery rate via knockoffs 20/36
Knockoff method
Selected variables: Sλ = {j : Wj ≥ +λ}Control group: S̃λ = {j : Wj ≤ −λ}
F̂DP(Sλ) :=
∣∣S̃λ∣∣∣∣Sλ∣∣
0 1 2 3 4
0
1
2
3
4
λ when Xj enters
λ w
hen
Xj e
nter
s
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●●●
●
●●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●●
●
● ●
●
●●
●
●
●●
●
●●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●
●
●
●
●
● ●
●
●●
●●
●●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●●
●●
●
●
●
●●
●●
●
●
●
●
●
● NullSignal
selected variables Sλ
control group Sλ
FDP(Sλ) =
∣∣Sλ ∩H0∣∣∣∣Sλ∣∣ ≈
∣∣S̃λ ∩H0∣∣∣∣Sλ∣∣ ≤ F̂DP(Sλ)
Jan 21 2015 Controlling false discovery rate via knockoffs 20/36
Knockoff method
The knockoff filter: define
F̂DP(Sλ) :=
∣∣S̃λ∣∣∣∣Sλ∣∣ =#{j : Wj ≤ −λ}#{j : Wj ≥ +λ}
,
then choose
Λ = min{λ : F̂DP(Sλ) ≤ q
}(or λ =∞ if empty set)
and select the variable set
SΛ = {j : Wj ≥ Λ} .
Jan 21 2015 Controlling false discovery rate via knockoffs 21/36
Knockoff method
Jan 21 2015 Controlling false discovery rate via knockoffs 22/36
Theoretical guarantees
Theorem 1: For SΛ chosen by the knockoff filter,
E [mFDP(SΛ)] ≤ q
where the modified FDP is given by
mFDP(S) =
∣∣S ∩H0∣∣∣∣S∣∣+ q−1.
Jan 21 2015 Controlling false discovery rate via knockoffs 23/36
Theoretical guarantees
The knockoff+ filter: define
F̂DP+(Sλ) :=
∣∣S̃λ∣∣+ 1∣∣Sλ∣∣ =#{j : Wj ≤ −λ}+ 1
#{j : Wj ≥ +λ},
then choose
Λ+ = min{λ : F̂DP+(Sλ) ≤ q
}(or λ =∞ if empty set)
and select the variable set
SΛ+ = {j : Wj ≥ Λ+} .
Jan 21 2015 Controlling false discovery rate via knockoffs 24/36
Theoretical guarantees
Theorem 2: For SΛ+ chosen by the knockoff+ filter,
E[FDP(SΛ+)
]≤ q .
Proof sketch:
FDP(SΛ+) =
∣∣SΛ+ ∩H0∣∣∣∣SΛ+
∣∣ =
∣∣S̃Λ+ ∩H0∣∣+ 1∣∣SΛ+
∣∣︸ ︷︷ ︸≤F̂DP+(SΛ+
)≤q
·∣∣SΛ+ ∩H0
∣∣∣∣S̃Λ+ ∩H0∣∣+ 1︸ ︷︷ ︸
martingale
Jan 21 2015 Controlling false discovery rate via knockoffs 25/36
Theoretical guarantees
Theorem 2: For SΛ+ chosen by the knockoff+ filter,
E[FDP(SΛ+)
]≤ q .
Proof sketch:
FDP(SΛ+) =
∣∣SΛ+ ∩H0∣∣∣∣SΛ+
∣∣ =
∣∣S̃Λ+ ∩H0∣∣+ 1∣∣SΛ+
∣∣︸ ︷︷ ︸≤F̂DP+(SΛ+
)≤q
·∣∣SΛ+ ∩H0
∣∣∣∣S̃Λ+ ∩H0∣∣+ 1︸ ︷︷ ︸
martingale
Jan 21 2015 Controlling false discovery rate via knockoffs 25/36
Theoretical guaranteesProof sketch cont’d:
M(λ) =
∣∣Sλ ∩H0∣∣∣∣S̃λ ∩H0
∣∣+ 1is a supermartingale w.r.t. increasing λ,
and Λ+ is a stopping time.
●●●●●●●●●
++
−
+
−−
+
−
++
−
++|W|
E [M(Λ+)] ≤ E [M(0)] = E[
C|H0| − C + 1
]≤ 1,
for C = # of + coin flips ∼ Bin(|H0|, 0.5)
Jan 21 2015 Controlling false discovery rate via knockoffs 26/36
Theoretical guaranteesProof sketch cont’d:
M(λ) =
∣∣Sλ ∩H0∣∣∣∣S̃λ ∩H0
∣∣+ 1is a supermartingale w.r.t. increasing λ,
and Λ+ is a stopping time.
●●●●●●●●●
−
+
−−
++
−
++|W|
E [M(Λ+)] ≤ E [M(0)] = E[
C|H0| − C + 1
]≤ 1,
for C = # of + coin flips ∼ Bin(|H0|, 0.5)
Jan 21 2015 Controlling false discovery rate via knockoffs 26/36
Theoretical guaranteesProof sketch cont’d:
M(λ) =
∣∣Sλ ∩H0∣∣∣∣S̃λ ∩H0
∣∣+ 1is a supermartingale w.r.t. increasing λ,
and Λ+ is a stopping time.
●●●●●●●●●
−
+
−−
++
−
++|W|
E [M(Λ+)] ≤ E [M(0)] = E[
C|H0| − C + 1
]≤ 1,
for C = # of + coin flips ∼ Bin(|H0|, 0.5)
Jan 21 2015 Controlling false discovery rate via knockoffs 26/36
Theoretical guaranteesProof sketch cont’d:
M(λ) =
∣∣Sλ ∩H0∣∣∣∣S̃λ ∩H0
∣∣+ 1is a supermartingale w.r.t. increasing λ,
and Λ+ is a stopping time.
●●●●●●●●●
−
+
−−
++
−
++|W|
E [M(Λ+)] ≤ E [M(0)] = E[
C|H0| − C + 1
]≤ 1,
for C = # of + coin flips ∼ Bin(|H0|, 0.5)
Jan 21 2015 Controlling false discovery rate via knockoffs 26/36
Theoretical guaranteesProof sketch cont’d:
M(λ) =
∣∣Sλ ∩H0∣∣∣∣S̃λ ∩H0
∣∣+ 1is a supermartingale w.r.t. increasing λ,
and Λ+ is a stopping time.
●●●●●●●●●
−
+
−−
++
−
++|W|
E [M(Λ+)] ≤ E [M(0)] = E[
C|H0| − C + 1
]≤ 1,
for C = # of + coin flips ∼ Bin(|H0|, 0.5)
Jan 21 2015 Controlling false discovery rate via knockoffs 26/36
Theoretical guaranteesProof sketch cont’d:
M(λ) =
∣∣Sλ ∩H0∣∣∣∣S̃λ ∩H0
∣∣+ 1is a supermartingale w.r.t. increasing λ,
and Λ+ is a stopping time.
●●●●●●●●●
−
+
−−
++
−
++|W|
E [M(Λ+)] ≤ E [M(0)] = E[
C|H0| − C + 1
]≤ 1,
for C = # of + coin flips ∼ Bin(|H0|, 0.5)Jan 21 2015 Controlling false discovery rate via knockoffs 26/36
Simulations
Setup:I n = 3000, p = 1000, sparsity level kI Features Xj are random unit vectors with correlation level ρ
I For signals j, β?jiid∼ {±A} for amplitude level A
I y = Xβ + N(0, In)
Compare knockoff, knockoff+, & Benjamini-Hochberg (BH).
Jan 21 2015 Controlling false discovery rate via knockoffs 27/36
Simulations
I Fix amplitude A = 3.5 & sparsity level k = 30I Vary feature correlation ρ from 0 to 0.9 (set E[X>j Xk] = ρ|j−k|)
0.0 0.2 0.4 0.6 0.8
0
5
10
15
20
25
30
Feature correlation ρ
FD
R (
%)
●
Nominal levelKnockoffKnockoff+BHq
●
●●
●●
●●
●●
●
0.0 0.2 0.4 0.6 0.8
0
20
40
60
80
100
Feature correlation ρ
Pow
er (
%)
●
KnockoffKnockoff+BHq
●
●●
●
●
●
●
●
●●
Jan 21 2015 Controlling false discovery rate via knockoffs 28/36
HIV data
Which mutations in the RT or protease of HIV-1determine susceptibility to RT inhibitors or protease inhibitors?
Data:Genotypic predictors of HIV type 1 drug resistance, Rhee et al (2006)Available at hivdb.stanford.edu (Stanford HIV Drug Resistance Database)
I Each drug analysed separatelyI Response y = resistance to the drugI Features X = which mutations are present in the RT or in the
protease
Jan 21 2015 Controlling false discovery rate via knockoffs 29/36
HIV data
The data set:
# protease or # mutationsDrug type # drugs Sample size RT positions appearing ≥ 3 times
genotyped in samplePI 6 848 99 209
NRTI 6 639 240 294NNRTI 3 747 240 319
To validate results:I Treatment-selected mutation (TSM) panel:
A separate study identifies mutations frequently present inpatients who have been treated with each type of drug
Jan 21 2015 Controlling false discovery rate via knockoffs 30/36
HIV data
The data set:
# protease or # mutationsDrug type # drugs Sample size RT positions appearing ≥ 3 times
genotyped in samplePI 6 848 99 209
NRTI 6 639 240 294NNRTI 3 747 240 319
To validate results:I Treatment-selected mutation (TSM) panel:
A separate study identifies mutations frequently present inpatients who have been treated with each type of drug
Jan 21 2015 Controlling false discovery rate via knockoffs 30/36
HIV data
Results forPI type drugs
Knockoff BHq
Data set size: n=768, p=201
# H
IV−
1 pr
otea
se p
ositi
ons
sele
cted
0
5
10
15
20
25
30
35
Resistance to APV
Appear in TSM listNot in TSM list
Knockoff BHq
Data set size: n=329, p=147
0
5
10
15
20
25
30
35
Resistance to ATV
Knockoff BHq
Data set size: n=826, p=208
0
5
10
15
20
25
30
35
Resistance to IDV
Knockoff BHq
Data set size: n=516, p=184
0
5
10
15
20
25
30
35
Resistance to LPV
Knockoff BHq
Data set size: n=843, p=209
0
5
10
15
20
25
30
35
Resistance to NFV
Knockoff BHq
Data set size: n=825, p=208
0
5
10
15
20
25
30
35
Resistance to SQV
Jan 21 2015 Controlling false discovery rate via knockoffs 31/36
HIV data
Results forNRTI type drugs
Results forNNRTI type drugs
Knockoff BHq
Data set size: n=633, p=292
# H
IV−
1 R
T p
ositi
ons
sele
cted
0
5
10
15
20
25
30
Resistance to X3TC
Appear in TSM listNot in TSM list
Knockoff BHq
Data set size: n=628, p=294
0
5
10
15
20
25
30
Resistance to ABC
Knockoff BHq
Data set size: n=630, p=292
0
5
10
15
20
25
30
Resistance to AZT
Knockoff BHq
Data set size: n=630, p=293
0
5
10
15
20
25
30
Resistance to D4T
Knockoff BHq
Data set size: n=632, p=292
0
5
10
15
20
25
30
Resistance to DDI
Knockoff BHq
Data set size: n=353, p=218
0
5
10
15
20
25
30
Resistance to TDF
Knockoff BHq
Data set size: n=732, p=311
# H
IV−
1 R
T p
ositi
ons
sele
cted
0
5
10
15
20
25
30
35
Resistance to DLV
Appear in TSM listNot in TSM list
Knockoff BHq
Data set size: n=734, p=318
0
5
10
15
20
25
30
35
Resistance to EFV
Knockoff BHq
Data set size: n=746, p=319
0
5
10
15
20
25
30
35
Resistance to NVP
Jan 21 2015 Controlling false discovery rate via knockoffs 32/36
Can knockoffs be replaced by permutations?
Let Xπ = X with rows randomly permuted. Then[X Xπ
]>[X Xπ]≈(
Σ 00 Σ
)
●
●●●●
●
●●●●●●●●●●●●●
●
●●
●
●●●●●
●●
●
●●●●
●
●
●
●
●
●●●●●●●
●●●●●●
●
●
●
●●●●
●
●●
●
●
●●
●
●
●
0 50 100 150 200
05
10152025
Index of column in the augmented matrix [X Xπ]
Valu
e of
λ w
hen
varia
ble
ente
rs m
odel
●
Original non−null featuresOriginal null featuresPermuted features
︸ ︷︷ ︸signals
︸ ︷︷ ︸nulls
︸ ︷︷ ︸permuted features
λj
FDR (target level q = 20%)Knockoff method 12.29%
Permutation method 45.61%
Jan 21 2015 Controlling false discovery rate via knockoffs 33/36
Can knockoffs be replaced by permutations?
Let Xπ = X with rows randomly permuted. Then[X Xπ
]>[X Xπ]≈(
Σ 00 Σ
)
●
●●●●
●
●●●●●●●●●●●●●
●
●●
●
●●●●●
●●
●
●●●●
●
●
●
●
●
●●●●●●●
●●●●●●
●
●
●
●●●●
●
●●
●
●
●●
●
●
●
0 50 100 150 200
05
10152025
Index of column in the augmented matrix [X Xπ]
Valu
e of
λ w
hen
varia
ble
ente
rs m
odel
●
Original non−null featuresOriginal null featuresPermuted features
︸ ︷︷ ︸signals
︸ ︷︷ ︸nulls
︸ ︷︷ ︸permuted features
λj
FDR (target level q = 20%)Knockoff method 12.29%
Permutation method 45.61%
Jan 21 2015 Controlling false discovery rate via knockoffs 33/36
Summary
The knockoff filter for inference in a sparse linear model:
• Creates a “control group” for any type of statistic
• Handles any type of feature correlation
• Unknown noise level & sparsity level
• Finite-sample FDR guarantees
Jan 21 2015 Controlling false discovery rate via knockoffs 34/36
Summary
Future work:
1. How to move to high-dimensional setting?
2. Extend to GLMs or other regression models?
3. Similar principles for other problems, e.g. graphical models?
Jan 21 2015 Controlling false discovery rate via knockoffs 35/36
Summary
Thank you!
I Code & demos available athttp://web.stanford.edu/~candes/Knockoffs/
I Paper available at http://arxiv.org/abs/1404.5609
I Joint work with Emmanuel Candès @ StanfordI R. F. B. was partially supported by NSF award DMS-1203762. E. C. is partially
supported by AFOSR under grant FA9550-09-1-0643, by NSF via grant CCF-0963835and by the Math + X Award from the Simons Foundation.
Jan 21 2015 Controlling false discovery rate via knockoffs 36/36
Recommended