Upload
hanguyet
View
226
Download
0
Embed Size (px)
Citation preview
Introduction
Construction
R functions
Variable
importance
Tests for variable
importance
Conditional
importance
Summary
References
Why and how to use random forest
variable importance measures
(and how you shouldn’t)
Carolin Strobl (LMU Munchen) and Achim Zeileis (WU Wien)
useR! 2008, Dortmund
Introduction
Construction
R functions
Variable
importance
Tests for variable
importance
Conditional
importance
Summary
References
Introduction
Random forests
I have become increasingly popular in, e.g., genetics and
the neurosciences
[imagine a long list of references here]
I can deal with “small n large p”-problems, high-order
interactions, correlated predictor variables
I are used not only for prediction, but also to assess
variable importance
Introduction
Construction
R functions
Variable
importance
Tests for variable
importance
Conditional
importance
Summary
References
Introduction
Random forests
I have become increasingly popular in, e.g., genetics and
the neurosciences
[imagine a long list of references here]
I can deal with “small n large p”-problems, high-order
interactions, correlated predictor variables
I are used not only for prediction, but also to assess
variable importance
Introduction
Construction
R functions
Variable
importance
Tests for variable
importance
Conditional
importance
Summary
References
Introduction
Random forests
I have become increasingly popular in, e.g., genetics and
the neurosciences [imagine a long list of references here]
I can deal with “small n large p”-problems, high-order
interactions, correlated predictor variables
I are used not only for prediction, but also to assess
variable importance
Introduction
Construction
R functions
Variable
importance
Tests for variable
importance
Conditional
importance
Summary
References
Introduction
Random forests
I have become increasingly popular in, e.g., genetics and
the neurosciences [imagine a long list of references here]
I can deal with “small n large p”-problems, high-order
interactions, correlated predictor variables
I are used not only for prediction, but also to assess
variable importance
Introduction
Construction
R functions
Variable
importance
Tests for variable
importance
Conditional
importance
Summary
References
Introduction
Random forests
I have become increasingly popular in, e.g., genetics and
the neurosciences [imagine a long list of references here]
I can deal with “small n large p”-problems, high-order
interactions, correlated predictor variables
I are used not only for prediction, but also to assess
variable importance
Introduction
Construction
R functions
Variable
importance
Tests for variable
importance
Conditional
importance
Summary
References
(Small) random forest
Startp < 0.001
1
≤≤ 8 >> 8
n = 15y = (0.4, 0.6)
2Start
p < 0.001
3
≤≤ 14 >> 14
n = 34y = (0.882, 0.118)
4n = 32
y = (1, 0)
5
Startp < 0.001
1
≤≤ 12 >> 12
n = 38y = (0.711, 0.289)
2Number
p < 0.001
3
≤≤ 3 >> 3
n = 25y = (1, 0)
4n = 18
y = (0.889, 0.111)
5
Startp < 0.001
1
≤≤ 12 >> 12
Agep < 0.001
2
≤≤ 27 >> 27
n = 10y = (1, 0)
3Number
p < 0.001
4
≤≤ 4 >> 4
n = 14y = (0.357, 0.643)
5n = 9
y = (0.111, 0.889)
6
Startp < 0.001
7
≤≤ 13 >> 13
n = 11y = (0.818, 0.182)
8n = 37
y = (1, 0)
9
Startp < 0.001
1
≤≤ 8 >> 8
Startp < 0.001
2
≤≤ 1 >> 1
n = 9y = (0.778, 0.222)
3n = 13
y = (0.154, 0.846)
4
Startp < 0.001
5
≤≤ 12 >> 12
n = 12y = (0.833, 0.167)
6n = 47
y = (1, 0)
7
Startp < 0.001
1
≤≤ 8 >> 8
n = 13y = (0.308, 0.692)
2Age
p < 0.001
3
≤≤ 87 >> 87
n = 36y = (1, 0)
4Start
p < 0.001
5
≤≤ 13 >> 13
n = 16y = (0.75, 0.25)
6n = 16
y = (1, 0)
7
Numberp < 0.001
1
≤≤ 5 >> 5
Agep < 0.001
2
≤≤ 81 >> 81
n = 33y = (1, 0)
3Start
p < 0.001
4
≤≤ 12 >> 12
n = 13y = (0.385, 0.615)
5Start
p < 0.001
6
≤≤ 15 >> 15
n = 12y = (0.833, 0.167)
7n = 12
y = (1, 0)
8
n = 11y = (0.364, 0.636)
9
Startp < 0.001
1
≤≤ 12 >> 12
Agep < 0.001
2
≤≤ 81 >> 81
n = 20y = (0.85, 0.15)
3n = 16
y = (0.188, 0.812)
4
Startp < 0.001
5
≤≤ 13 >> 13
n = 11y = (0.818, 0.182)
6n = 34
y = (1, 0)
7
Startp < 0.001
1
≤≤ 12 >> 12
Agep < 0.001
2
≤≤ 71 >> 71
n = 15y = (0.667, 0.333)
3n = 17
y = (0.235, 0.765)
4
Startp < 0.001
5
≤≤ 14 >> 14
n = 17y = (0.882, 0.118)
6n = 32
y = (1, 0)
7
Startp < 0.001
1
≤≤ 12 >> 12
Agep < 0.001
2
≤≤ 68 >> 68
Numberp < 0.001
3
≤≤ 4 >> 4
n = 11y = (1, 0)
4n = 9
y = (0.556, 0.444)
5
n = 12y = (0.25, 0.75)
6
n = 49y = (1, 0)
7
Startp < 0.001
1
≤≤ 12 >> 12
Agep < 0.001
2
≤≤ 18 >> 18
n = 10y = (0.9, 0.1)
3Number
p < 0.001
4
≤≤ 4 >> 4
n = 12y = (0.417, 0.583)
5n = 10
y = (0.2, 0.8)
6
Numberp < 0.001
7
≤≤ 3 >> 3
n = 28y = (1, 0)
8n = 21
y = (0.952, 0.048)
9
Startp < 0.001
1
≤≤ 8 >> 8
Startp < 0.001
2
≤≤ 3 >> 3
n = 12y = (0.667, 0.333)
3n = 14
y = (0.143, 0.857)
4
Agep < 0.001
5
≤≤ 136 >> 136
n = 47y = (1, 0)
6n = 8
y = (0.75, 0.25)
7
Startp < 0.001
1
≤≤ 12 >> 12
n = 28y = (0.607, 0.393)
2Start
p < 0.001
3
≤≤ 14 >> 14
n = 21y = (0.905, 0.095)
4n = 32
y = (1, 0)
5
Startp < 0.001
1
≤≤ 1 >> 1
n = 8y = (0.375, 0.625)
2Number
p < 0.001
3
≤≤ 4 >> 4
Agep < 0.001
4
≤≤ 125 >> 125
n = 31y = (1, 0)
5n = 11
y = (0.818, 0.182)
6
n = 31y = (0.806, 0.194)
7
Startp < 0.001
1
≤≤ 14 >> 14
Agep < 0.001
2
≤≤ 71 >> 71
n = 15y = (0.933, 0.067)
3Start
p < 0.001
4
≤≤ 12 >> 12
n = 16y = (0.375, 0.625)
5n = 15
y = (0.733, 0.267)
6
n = 35y = (1, 0)
7
Numberp < 0.001
1
≤≤ 6 >> 6
Numberp < 0.001
2
≤≤ 3 >> 3
Startp < 0.001
3
≤≤ 13 >> 13
n = 10y = (0.8, 0.2)
4n = 24
y = (1, 0)
5
n = 37y = (0.865, 0.135)
6
n = 10y = (0.5, 0.5)
7
Startp < 0.001
1
≤≤ 8 >> 8
n = 18y = (0.5, 0.5)
2Start
p < 0.001
3
≤≤ 12 >> 12
n = 18y = (0.833, 0.167)
4Number
p < 0.001
5
≤≤ 3 >> 3
n = 30y = (1, 0)
6n = 15
y = (0.933, 0.067)
7
Introduction
Construction
R functions
Variable
importance
Tests for variable
importance
Conditional
importance
Summary
References
Construction of a random forest
I draw ntree bootstrap samples from original sample
I fit a classification tree to each bootstrap sample
⇒ ntree trees
I creates diverse set of trees because
I trees are instable w.r.t. changes in learning data
⇒ ntree different looking trees (bagging)
I randomly preselect mtry splitting variables in each split
⇒ ntree more different looking trees (random forest)
Introduction
Construction
R functions
Variable
importance
Tests for variable
importance
Conditional
importance
Summary
References
Construction of a random forest
I draw ntree bootstrap samples from original sample
I fit a classification tree to each bootstrap sample
⇒ ntree trees
I creates diverse set of trees because
I trees are instable w.r.t. changes in learning data
⇒ ntree different looking trees (bagging)
I randomly preselect mtry splitting variables in each split
⇒ ntree more different looking trees (random forest)
Introduction
Construction
R functions
Variable
importance
Tests for variable
importance
Conditional
importance
Summary
References
Construction of a random forest
I draw ntree bootstrap samples from original sample
I fit a classification tree to each bootstrap sample
⇒ ntree trees
I creates diverse set of trees because
I trees are instable w.r.t. changes in learning data
⇒ ntree different looking trees (bagging)
I randomly preselect mtry splitting variables in each split
⇒ ntree more different looking trees (random forest)
Introduction
Construction
R functions
Variable
importance
Tests for variable
importance
Conditional
importance
Summary
References
Construction of a random forest
I draw ntree bootstrap samples from original sample
I fit a classification tree to each bootstrap sample
⇒ ntree trees
I creates diverse set of trees because
I trees are instable w.r.t. changes in learning data
⇒ ntree different looking trees (bagging)
I randomly preselect mtry splitting variables in each split
⇒ ntree more different looking trees (random forest)
Introduction
Construction
R functions
Variable
importance
Tests for variable
importance
Conditional
importance
Summary
References
Random forests in R
I randomForest (pkg: randomForest)
I reference implementation based on CART trees
(Breiman, 2001; Liaw and Wiener, 2008)
– for variables of different types: biased in favor of
continuous variables and variables with many categories
(Strobl, Boulesteix, Zeileis, and Hothorn, 2007)
I cforest (pkg: party)
I based on unbiased conditional inference trees
(Hothorn, Hornik, and Zeileis, 2006)
+ for variables of different types: unbiased when
subsampling, instead of bootstrap sampling, is used
(Strobl, Boulesteix, Zeileis, and Hothorn, 2007)
Introduction
Construction
R functions
Variable
importance
Tests for variable
importance
Conditional
importance
Summary
References
(Small) random forest
Startp < 0.001
1
≤≤ 8 >> 8
n = 15y = (0.4, 0.6)
2Start
p < 0.001
3
≤≤ 14 >> 14
n = 34y = (0.882, 0.118)
4n = 32
y = (1, 0)
5
Startp < 0.001
1
≤≤ 12 >> 12
n = 38y = (0.711, 0.289)
2Number
p < 0.001
3
≤≤ 3 >> 3
n = 25y = (1, 0)
4n = 18
y = (0.889, 0.111)
5
Startp < 0.001
1
≤≤ 12 >> 12
Agep < 0.001
2
≤≤ 27 >> 27
n = 10y = (1, 0)
3Number
p < 0.001
4
≤≤ 4 >> 4
n = 14y = (0.357, 0.643)
5n = 9
y = (0.111, 0.889)
6
Startp < 0.001
7
≤≤ 13 >> 13
n = 11y = (0.818, 0.182)
8n = 37
y = (1, 0)
9
Startp < 0.001
1
≤≤ 8 >> 8
Startp < 0.001
2
≤≤ 1 >> 1
n = 9y = (0.778, 0.222)
3n = 13
y = (0.154, 0.846)
4
Startp < 0.001
5
≤≤ 12 >> 12
n = 12y = (0.833, 0.167)
6n = 47
y = (1, 0)
7
Startp < 0.001
1
≤≤ 8 >> 8
n = 13y = (0.308, 0.692)
2Age
p < 0.001
3
≤≤ 87 >> 87
n = 36y = (1, 0)
4Start
p < 0.001
5
≤≤ 13 >> 13
n = 16y = (0.75, 0.25)
6n = 16
y = (1, 0)
7
Numberp < 0.001
1
≤≤ 5 >> 5
Agep < 0.001
2
≤≤ 81 >> 81
n = 33y = (1, 0)
3Start
p < 0.001
4
≤≤ 12 >> 12
n = 13y = (0.385, 0.615)
5Start
p < 0.001
6
≤≤ 15 >> 15
n = 12y = (0.833, 0.167)
7n = 12
y = (1, 0)
8
n = 11y = (0.364, 0.636)
9
Startp < 0.001
1
≤≤ 12 >> 12
Agep < 0.001
2
≤≤ 81 >> 81
n = 20y = (0.85, 0.15)
3n = 16
y = (0.188, 0.812)
4
Startp < 0.001
5
≤≤ 13 >> 13
n = 11y = (0.818, 0.182)
6n = 34
y = (1, 0)
7
Startp < 0.001
1
≤≤ 12 >> 12
Agep < 0.001
2
≤≤ 71 >> 71
n = 15y = (0.667, 0.333)
3n = 17
y = (0.235, 0.765)
4
Startp < 0.001
5
≤≤ 14 >> 14
n = 17y = (0.882, 0.118)
6n = 32
y = (1, 0)
7
Startp < 0.001
1
≤≤ 12 >> 12
Agep < 0.001
2
≤≤ 68 >> 68
Numberp < 0.001
3
≤≤ 4 >> 4
n = 11y = (1, 0)
4n = 9
y = (0.556, 0.444)
5
n = 12y = (0.25, 0.75)
6
n = 49y = (1, 0)
7
Startp < 0.001
1
≤≤ 12 >> 12
Agep < 0.001
2
≤≤ 18 >> 18
n = 10y = (0.9, 0.1)
3Number
p < 0.001
4
≤≤ 4 >> 4
n = 12y = (0.417, 0.583)
5n = 10
y = (0.2, 0.8)
6
Numberp < 0.001
7
≤≤ 3 >> 3
n = 28y = (1, 0)
8n = 21
y = (0.952, 0.048)
9
Startp < 0.001
1
≤≤ 8 >> 8
Startp < 0.001
2
≤≤ 3 >> 3
n = 12y = (0.667, 0.333)
3n = 14
y = (0.143, 0.857)
4
Agep < 0.001
5
≤≤ 136 >> 136
n = 47y = (1, 0)
6n = 8
y = (0.75, 0.25)
7
Startp < 0.001
1
≤≤ 12 >> 12
n = 28y = (0.607, 0.393)
2Start
p < 0.001
3
≤≤ 14 >> 14
n = 21y = (0.905, 0.095)
4n = 32
y = (1, 0)
5
Startp < 0.001
1
≤≤ 1 >> 1
n = 8y = (0.375, 0.625)
2Number
p < 0.001
3
≤≤ 4 >> 4
Agep < 0.001
4
≤≤ 125 >> 125
n = 31y = (1, 0)
5n = 11
y = (0.818, 0.182)
6
n = 31y = (0.806, 0.194)
7
Startp < 0.001
1
≤≤ 14 >> 14
Agep < 0.001
2
≤≤ 71 >> 71
n = 15y = (0.933, 0.067)
3Start
p < 0.001
4
≤≤ 12 >> 12
n = 16y = (0.375, 0.625)
5n = 15
y = (0.733, 0.267)
6
n = 35y = (1, 0)
7
Numberp < 0.001
1
≤≤ 6 >> 6
Numberp < 0.001
2
≤≤ 3 >> 3
Startp < 0.001
3
≤≤ 13 >> 13
n = 10y = (0.8, 0.2)
4n = 24
y = (1, 0)
5
n = 37y = (0.865, 0.135)
6
n = 10y = (0.5, 0.5)
7
Startp < 0.001
1
≤≤ 8 >> 8
n = 18y = (0.5, 0.5)
2Start
p < 0.001
3
≤≤ 12 >> 12
n = 18y = (0.833, 0.167)
4Number
p < 0.001
5
≤≤ 3 >> 3
n = 30y = (1, 0)
6n = 15
y = (0.933, 0.067)
7
Introduction
Construction
R functions
Variable
importance
Tests for variable
importance
Conditional
importance
Summary
References
Measuring variable importance
I Gini importance
mean Gini gain produced by Xj over all trees
I obj <- randomForest(..., importance=TRUE)
obj$importance column: MeanDecreaseGini
importance(obj, type=2)
for variables of different types: biased in favor of continuous
variables and variables with many categories
Introduction
Construction
R functions
Variable
importance
Tests for variable
importance
Conditional
importance
Summary
References
Measuring variable importance
I permutation importance
mean decrease in classification accuracy after
permuting Xj over all trees
I obj <- randomForest(..., importance=TRUE)
obj$importance column: MeanDecreaseAccuracy
importance(obj, type=1)
I obj <- cforest(...)
varimp(obj)
for variables of different types: unbiased only when
subsampling is used as in cforest(..., controls =
cforest unbiased())
Introduction
Construction
R functions
Variable
importance
Tests for variable
importance
Conditional
importance
Summary
References
The permutation importance
within each tree t
VI (t)(xj) =
∑i∈B
(t) I(yi = y
(t)i
)∣∣∣B(t)
∣∣∣ −
∑i∈B
(t) I(yi = y
(t)i ,πj
)∣∣∣B(t)
∣∣∣y
(t)i = f (t)(xi ) = predicted class before permuting
y(t)i ,πj
= f (t)(xi ,πj) = predicted class after permuting Xj
xi ,πj= (xi ,1, . . . , xi ,j−1, xπj (i),j , xi ,j+1, . . . , xi ,p
)Note: VI (t)(xj) = 0 by definition, if Xj is not in tree t
Introduction
Construction
R functions
Variable
importance
Tests for variable
importance
Conditional
importance
Summary
References
The permutation importance
over all trees:
1. raw importance
VI (xj) =
∑ntreet=1 VI (t)(xj)
ntree
I obj <- randomForest(..., importance=TRUE)
importance(obj, type=1, scale=FALSE)
Introduction
Construction
R functions
Variable
importance
Tests for variable
importance
Conditional
importance
Summary
References
The permutation importance
over all trees:
2. scaled importance (z-score)
VI (xj)σ√
ntree
= zj
I obj <- randomForest(..., importance=TRUE)
importance(obj, type=1, scale=TRUE) (default)
Introduction
Construction
R functions
Variable
importance
Tests for variable
importance
Conditional
importance
Summary
References
Tests for variable importance
for variable selection purposes
I Breiman and Cutler (2008): simple significance test
based on normality of z-score
randomForest, scale=TRUE + α-quantile of N(0,1)
I Diaz-Uriarte and Alvarez de Andres (2006): backward
elimination (throw out least important variables until
out-of-bag prediction accuracy drops)
varSelRF (pkg: varSelRF), dep. on randomForest
I Diaz-Uriarte (2007) and Rodenburg et al. (2008): plots
and significance test (randomly permute response values
to mimic the overall null hypothesis that none of the
predictor variables is relevant = baseline)
Introduction
Construction
R functions
Variable
importance
Tests for variable
importance
Conditional
importance
Summary
References
Tests for variable importance
for variable selection purposes
I Breiman and Cutler (2008): simple significance test
based on normality of z-score
randomForest, scale=TRUE + α-quantile of N(0,1)
I Diaz-Uriarte and Alvarez de Andres (2006): backward
elimination (throw out least important variables until
out-of-bag prediction accuracy drops)
varSelRF (pkg: varSelRF), dep. on randomForest
I Diaz-Uriarte (2007) and Rodenburg et al. (2008): plots
and significance test (randomly permute response values
to mimic the overall null hypothesis that none of the
predictor variables is relevant = baseline)
Introduction
Construction
R functions
Variable
importance
Tests for variable
importance
Conditional
importance
Summary
References
Tests for variable importance
for variable selection purposes
I Breiman and Cutler (2008): simple significance test
based on normality of z-score
randomForest, scale=TRUE + α-quantile of N(0,1)
I Diaz-Uriarte and Alvarez de Andres (2006): backward
elimination (throw out least important variables until
out-of-bag prediction accuracy drops)
varSelRF (pkg: varSelRF), dep. on randomForest
I Diaz-Uriarte (2007) and Rodenburg et al. (2008): plots
and significance test (randomly permute response values
to mimic the overall null hypothesis that none of the
predictor variables is relevant = baseline)
Introduction
Construction
R functions
Variable
importance
Tests for variable
importance
Conditional
importance
Summary
References
Tests for variable importance
for variable selection purposes
I Breiman and Cutler (2008): simple significance test
based on normality of z-score
randomForest, scale=TRUE + α-quantile of N(0,1)
I Diaz-Uriarte and Alvarez de Andres (2006): backward
elimination (throw out least important variables until
out-of-bag prediction accuracy drops)
varSelRF (pkg: varSelRF), dep. on randomForest
I Diaz-Uriarte (2007) and Rodenburg et al. (2008): plots
and significance test (randomly permute response values
to mimic the overall null hypothesis that none of the
predictor variables is relevant = baseline)
Introduction
Construction
R functions
Variable
importance
Tests for variable
importance
Conditional
importance
Summary
References
Tests for variable importance
problems of these approaches:
I (at least) Breiman and Cutler (2008): strange statistical
properties (Strobl and Zeileis, 2008)
I all: preference of correlated predictor variables (see also
Nicodemus and Shugart, 2007; Archer and Kimes, 2008)
Introduction
Construction
R functions
Variable
importance
Tests for variable
importance
Conditional
importance
Summary
References
Tests for variable importance
problems of these approaches:
I (at least) Breiman and Cutler (2008): strange statistical
properties (Strobl and Zeileis, 2008)
I all: preference of correlated predictor variables (see also
Nicodemus and Shugart, 2007; Archer and Kimes, 2008)
Introduction
Construction
R functions
Variable
importance
Tests for variable
importance
Conditional
importance
Summary
References
Tests for variable importance
problems of these approaches:
I (at least) Breiman and Cutler (2008): strange statistical
properties (Strobl and Zeileis, 2008)
I all: preference of correlated predictor variables (see also
Nicodemus and Shugart, 2007; Archer and Kimes, 2008)
Introduction
Construction
R functions
Variable
importance
Tests for variable
importance
Conditional
importance
Summary
References
Breiman and Cutler’s test
under the null hypothesis of zero importance:
zjas.∼ N(0, 1)
if zj exceeds the α-quantile of N(0,1) ⇒ reject the
null hypothesis of zero importance for variable Xj
Introduction
Construction
R functions
Variable
importance
Tests for variable
importance
Conditional
importance
Summary
References
Raw importance
relevance0.0 0.1 0.2 0.3 0.4
ntree = 100mean importance
0.0 0.1 0.2 0.3 0.4
ntree = 200mean importance
0.0 0.1 0.2 0.3 0.4
ntree = 500mean importance
sample size100200500
Introduction
Construction
R functions
Variable
importance
Tests for variable
importance
Conditional
importance
Summary
References
z-score and power
relevance
0.0 0.1 0.2 0.3 0.4
ntree = 100power
0.0 0.1 0.2 0.3 0.4
ntree = 200power
0.0 0.1 0.2 0.3 0.4
ntree = 500power
0.0 0.1 0.2 0.3 0.4
ntree = 100z−score
0.0 0.1 0.2 0.3 0.4
ntree = 200z−score
0.0 0.1 0.2 0.3 0.4
ntree = 500z−score
sample size100200500
Introduction
Construction
R functions
Variable
importance
Tests for variable
importance
Conditional
importance
Summary
References
Findings
z-score and power
I increase in ntree
I decrease in sample size
⇒ rather use raw, unscaled permutation importance!
importance(obj, type=1, scale=FALSE)
varimp(obj)
Introduction
Construction
R functions
Variable
importance
Tests for variable
importance
Conditional
importance
Summary
References
What null hypothesis were we testing
in the first place?
obs Y Xj Z
1 y1 xπj (1),j z1
......
......
i yi xπj (i),j zi
......
......
n yn xπj (n),j zn
H0 : Xj ⊥ Y ,Z or Xj ⊥ Y ∧ Xj ⊥ Z
P(Y ,Xj ,Z )H0= P(Y ,Z ) · P(Xj)
Introduction
Construction
R functions
Variable
importance
Tests for variable
importance
Conditional
importance
Summary
References
What null hypothesis were we testing
in the first place?
the current null hypothesis reflects independence of Xj from
both Y and the remaining predictor variables Z
⇒ a high variable importance can result from violation of
either one!
Introduction
Construction
R functions
Variable
importance
Tests for variable
importance
Conditional
importance
Summary
References
What null hypothesis were we testing
in the first place?
the current null hypothesis reflects independence of Xj from
both Y and the remaining predictor variables Z
⇒ a high variable importance can result from violation of
either one!
Introduction
Construction
R functions
Variable
importance
Tests for variable
importance
Conditional
importance
Summary
References
Suggestion: Conditional permutation scheme
obs Y Xj Z
1 y1 xπj|Z=a(1),j z1 = a
3 y3 xπj|Z=a(3),j z3 = a
27 y27 xπj|Z=a(27),j z27 = a
6 y6 xπj|Z=b(6),j z6 = b
14 y14 xπj|Z=b(14),j z14 = b
33 y33 xπj|Z=b(33),j z33 = b...
......
...
H0 : Xj ⊥ Y |Z
P(Y ,Xj |Z )H0= P(Y |Z ) · P(Xj |Z )
or P(Y |Xj ,Z )H0= P(Y |Z )
Introduction
Construction
R functions
Variable
importance
Tests for variable
importance
Conditional
importance
Summary
References
Technically
I use any partition of the feature space for conditioning
I here: use binary partition already learned by tree
(use cutpoints as bisectors of feature space)
I condition on correlated variables or select some
Strobl et al. (2008)
available in cforest from version 0.9-994: varimp(obj,
conditional = TRUE)
Introduction
Construction
R functions
Variable
importance
Tests for variable
importance
Conditional
importance
Summary
References
Technically
I use any partition of the feature space for conditioning
I here: use binary partition already learned by tree
(use cutpoints as bisectors of feature space)
I condition on correlated variables or select some
Strobl et al. (2008)
available in cforest from version 0.9-994: varimp(obj,
conditional = TRUE)
Introduction
Construction
R functions
Variable
importance
Tests for variable
importance
Conditional
importance
Summary
References
Technically
I use any partition of the feature space for conditioning
I here: use binary partition already learned by tree
(use cutpoints as bisectors of feature space)
I condition on correlated variables or select some
Strobl et al. (2008)
available in cforest from version 0.9-994: varimp(obj,
conditional = TRUE)
Introduction
Construction
R functions
Variable
importance
Tests for variable
importance
Conditional
importance
Summary
References
Technically
I use any partition of the feature space for conditioning
I here: use binary partition already learned by tree
(use cutpoints as bisectors of feature space)
I condition on correlated variables or select some
Strobl et al. (2008)
available in cforest from version 0.9-994: varimp(obj,
conditional = TRUE)
Introduction
Construction
R functions
Variable
importance
Tests for variable
importance
Conditional
importance
Summary
References
Simulation study
I dgp: yi = β1 · xi ,1 + · · ·+β12 · xi ,12 + εi , εii .i .d .∼ N(0, 0.5)
I X1, . . . ,X12 ∼ N(0,Σ)
Σ =
1 0.9 0.9 0.9 0 · · · 0
0.9 1 0.9 0.9 0 · · · 0
0.9 0.9 1 0.9 0 · · · 0
0.9 0.9 0.9 1 0 · · · 0
0 0 0 0 1 · · · 0...
......
......
. . . 0
0 0 0 0 0 0 1
Xj X1 X2 X3 X4 X5 X6 X7 X8 · · · X12
βj 5 5 2 0 -5 -5 -2 0 · · · 0
Introduction
Construction
R functions
Variable
importance
Tests for variable
importance
Conditional
importance
Summary
References
Results
mtr
y =
1
● ●●
●
● ●
●● ● ● ● ●0
515
25
mtr
y =
3
● ●
●
●
● ●
● ● ● ● ● ●010
3050
variable
mtr
y =
8
● ●
●●
● ●
● ● ● ● ● ●
1 2 3 4 5 6 7 8 9 10 11 12
020
4060
80
variable
Introduction
Construction
R functions
Variable
importance
Tests for variable
importance
Conditional
importance
Summary
References
Peptide-binding data
00.
005
unco
nditi
onal
cond
ition
al
00.
005
cond
ition
al
h2y8 flex8 pol3*
Introduction
Construction
R functions
Variable
importance
Tests for variable
importance
Conditional
importance
Summary
References
Summary
if your predictor variables are of different types:
use cforest (pkg: party) with default option controls =
cforest unbiased()
with permutation importance varimp(obj)
otherwise: feel free to use cforest (pkg: party)
with permutation importance varimp(obj)
or randomForest (pkg: randomForest)
with permutation importance importance(obj, type=1)
or Gini importance importance(obj, type=2)
but don’t fall for the z-score! (i.e. set scale=FALSE)
if your predictor variables are highly correlated: use the
conditional importance in cforest (pkg: party)
Introduction
Construction
R functions
Variable
importance
Tests for variable
importance
Conditional
importance
Summary
References
Summary
if your predictor variables are of different types:
use cforest (pkg: party) with default option controls =
cforest unbiased()
with permutation importance varimp(obj)
otherwise: feel free to use cforest (pkg: party)
with permutation importance varimp(obj)
or randomForest (pkg: randomForest)
with permutation importance importance(obj, type=1)
or Gini importance importance(obj, type=2)
but don’t fall for the z-score! (i.e. set scale=FALSE)
if your predictor variables are highly correlated: use the
conditional importance in cforest (pkg: party)
Introduction
Construction
R functions
Variable
importance
Tests for variable
importance
Conditional
importance
Summary
References
Summary
if your predictor variables are of different types:
use cforest (pkg: party) with default option controls =
cforest unbiased()
with permutation importance varimp(obj)
otherwise: feel free to use cforest (pkg: party)
with permutation importance varimp(obj)
or randomForest (pkg: randomForest)
with permutation importance importance(obj, type=1)
or Gini importance importance(obj, type=2)
but don’t fall for the z-score! (i.e. set scale=FALSE)
if your predictor variables are highly correlated: use the
conditional importance in cforest (pkg: party)
Introduction
Construction
R functions
Variable
importance
Tests for variable
importance
Conditional
importance
Summary
References
Summary
if your predictor variables are of different types:
use cforest (pkg: party) with default option controls =
cforest unbiased()
with permutation importance varimp(obj)
otherwise: feel free to use cforest (pkg: party)
with permutation importance varimp(obj)
or randomForest (pkg: randomForest)
with permutation importance importance(obj, type=1)
or Gini importance importance(obj, type=2)
but don’t fall for the z-score! (i.e. set scale=FALSE)
if your predictor variables are highly correlated: use the
conditional importance in cforest (pkg: party)
Introduction
Construction
R functions
Variable
importance
Tests for variable
importance
Conditional
importance
Summary
References
Introduction
Construction
R functions
Variable
importance
Tests for variable
importance
Conditional
importance
Summary
References
Archer, K. J. and R. V. Kimes (2008). Empirical characterization
of random forest variable importance measures. Computational
Statistics & Data Analysis 52(4), 2249–2260.
Breiman, L. (2001). Random forests. Machine Learning 45(1),
5–32.
Breiman, L. and A. Cutler (2008). Random forests – Classification
manual. Website accessed in 1/2008;
http://www.math.usu.edu/∼adele/forests.
Breiman, L., A. Cutler, A. Liaw, and M. Wiener (2006). Breiman
and Cutler’s Random Forests for Classification and Regression.
R package version 4.5-16.
Diaz-Uriarte, R. (2007). GeneSrF and varselrf: A web-based
tool and R package for gene selection and classification using
random forest. BMC Bioinformatics 8:328.
Introduction
Construction
R functions
Variable
importance
Tests for variable
importance
Conditional
importance
Summary
References
Hothorn, T., K. Hornik, and A. Zeileis (2006). Unbiased recursive
partitioning: A conditional inference framework. Journal of
Computational and Graphical Statistics 15(3), 651–674.
Strobl, C., A.-L. Boulesteix, A. Zeileis, and T. Hothorn (2007).
Bias in random forest variable importance measures:
Illustrations, sources and a solution. BMC Bioinformatics 8:25.
Strobl, C. and A. Zeileis (2008). Danger: High power! – exploring
the statistical properties of a test for random forest variable
importance. In Proceedings of the 18th International
Conference on Computational Statistics, Porto, Portugal.
Strobl, C., A.-L. Boulesteix, T. Kneib, T. Augustin, and A. Zeileis
(2008). Conditional variable importance for random forests.
BMC Bioinformatics 9:307.