Upload
albert-orriols-puig
View
328
Download
0
Embed Size (px)
DESCRIPTION
Citation preview
Bounding XCS’s Parameters f U b l d D t tfor Unbalanced Datasets
Albert Orriols-PuigEster Bernadó-Mansilla
Research Group in Intelligent SystemsEnginyeria i Arquitectura La Salle
Ramon Llull UniversityBarcelona, Spain, p
Framework
New instance
Learner M d l
Information basedon experience
Knowledgeextraction
New instance
Dataset Learner Model
Consisting
Examples
Counter-examples
Predicted Output
ofCou te e a p es
In real-world domains, typically:, yp yHigher cost to obtain examples of the concept to be learntSo, distribution of examples in the training dataset is usually unbalanced
Applications:Fraud DetectionRare medical diagnosis
Slide 2GRSI Enginyeria i Arquitectura la Salle
Rare medical diagnosisDetection of oil spills in satellite images
Framework
Do learners suffer from class imbalances?
L Minimize theTrainingLearner Minimize the
global errorSet
examplesnumbererrorsnumerrorsnum
error cc 21 .. +=Biased towards
the overwhelmed class
Maximization of the overwhelmed class accuracy,in detriment of the minority class.
Slide 3GRSI Enginyeria i Arquitectura la Salle
Aim
Analyze the performance of XCS when learning from imbalanced datasets
Analyze the contribution of the different components
P h th t f ilit t t lPropose approaches that facilitate to learn minority class regions
Slide 4GRSI Enginyeria i Arquitectura la Salle
Outline
1. Description of XCS
2. Description of the Domain
3 E i t ti3. Experimentation
4 XCS and Class Imbalances4. XCS and Class Imbalances
5. Guidelines for Parameter Tuning
6. Online Adaptation
7. Conclusions
Slide 5GRSI Enginyeria i Arquitectura la Salle
1. Description of XCS1. Description of XCS2. Description of the domain3. Experimentation4. XCS and class imbalances5. Guidelines for P. Tuningp
In single-step tasks:
g6. Online Adaptation7. Conclusions
Environment
g p
Problem instance
1 C A P ε F num as ts exp3 C A P ε F num as ts exp5 C A P ε F num as ts exp
Match Set [M]Selected
action
1 C A P ε F num as ts exp2 C A P ε F num as ts exp3 C A P ε F num as ts exp
Population [P] Match set generation
5 C A P ε F num as ts exp6 C A P ε F num as ts exp
…Prediction Array
REWARD
4 C A P ε F num as ts exp5 C A P ε F num as ts exp6 C A P ε F num as ts exp
…
A ti S t [A]
c1 c2 … cn
Random Action
1 C A P ε F num as ts exp3 C A P ε F num as ts exp5 C A P ε F num as ts exp
C
Action Set [A]
Selection, Reproduction, Mutation
Deletion ClassifierParameters
Update6 C A P ε F num as ts exp…Genetic Algorithm
Update
Slide 6GRSI Enginyeria i Arquitectura la Salle
1. Description of XCS1. Description of XCS2. Description of the domain3. Experimentation4. XCS and class imbalances5. Guidelines for P. Tuningp g6. Online Adaptation7. Conclusions
Learning domain
Environment
Prediction RewardPrediction Reward
Set of Rules
GA ReinforcementLearning
R ti b t l 525 75Ratio between classes 525:75
1 minority class example
7 majority class examples
Slide 7GRSI Enginyeria i Arquitectura la Salle
j y p
2. Description of the Domain1. Description of XCS2. Description of the domain3. Experimentation4. XCS and class imbalances5. Guidelines for P. Tuningp g6. Online Adaptation7. Conclusions
Selection
(11-bit) Multiplexer Imbalanced MultiplexerSelection
bitsPosition
bits
Example:
Complexity related to the
000 10010100:1
Co p e y e a ed o enumber of selection bitsCompletely balanced
•We under-sampled class 1
•ir: Proportion between majority andXCS should evolve:
000 0#######:0 000 0#######:1 000 1#######:0 000 1#######:1
ir: Proportion between majority and minority class instances
•i: imbalance level (i=log ir)001 #0######:0 001 #0######:1 001 #1######:0 001 #1######:1
010 ##0#####:0 010 ##0#####:1 010 ##1#####:0 010 ##1#####:1
011 ###0####:0 011 ###0####:1 011 ###1####:0 011 ###1####:1
•i: imbalance level (i=log2ir)
100 ####0###:0 100 ####0###:1 100 ####1###:0 100 ####1###:1
101 #####0##:0 101 #####0##:1 101 #####1##:0 101 #####1##:1
110 ######0#:0 110 ######0#:1 110 ######1#:0 110 ######1#:1
111 #######0:0 111 #######0:1 111 #######1:0 111 #######1:1
Slide 8GRSI Enginyeria i Arquitectura la Salle
111 #######0:0 111 #######0:1 111 #######1:0 111 #######1:1
3. Experimentation1. Description of XCS2. Description of the domain3. Experimentation4. XCS and class imbalances5. Guidelines for P. Tuningp g6. Online Adaptation7. Conclusions
We ran XCS with the following standard configuration fromWe ran XCS with the following standard configuration from i=0 (ir=1) to i=9 (ir=512:1):
N=800, α=0.1, ν=5, Rmax = 1000, ε0=1, θGA=25, β=0.2,χ=0.8, μ=0.4, θdel=20, δ=0.1, θsub=200, P#=0.6
selection=rws mutation=nichedselection=rws, mutation=niched,GAsub=true, [A]sub=false
Slide 9GRSI Enginyeria i Arquitectura la Salle
3. Experimentation1. Description of XCS2. Description of the domain3. Experimentation4. XCS and class imbalances5. Guidelines for P. Tuningp g6. Online Adaptation7. Conclusions
True Negative rate True Positive rate
i 16 1ir = 32:1
i 64 1ir = 16:1 ir = 64:1
Slide 10GRSI Enginyeria i Arquitectura la Salle
3. Experimentation1. Description of XCS2. Description of the domain3. Experimentation4. XCS and class imbalances5. Guidelines for P. Tuningp g6. Online Adaptation7. Conclusions
Most numerous rules, ir=128:1
Condition:Action P Error F Num
###########:0 1000 0.120 0.98 385
###########:1 1.2 · 10-4 0.074 0.98 366
Estimated parametersare too high. Theoretically:
P:0 = 992.24 P:1 = 15.38P:0 992.24 P:1 15.38ε0:0 = ε0:1 = 7.75
Overgeneral classifiersovertake the population
(they represent the 94%of the population)
Slide 11GRSI Enginyeria i Arquitectura la Salle
4. XCS and Class Imbalances1. Description of XCS2. Description of the domain3. Experimentation4. XCS and class imbalances5. Guidelines for P. Tuningg6. Online Adaptation7. Conclusions
We analyze the following factors:
Classifiers’ ErrorClassifiers Error
Stability of Prediction and Error Estimatesy
Occurrence-based Reproduction
Slide 12GRSI Enginyeria i Arquitectura la Salle
4. XCS and Class Imbalances 4 1 Classifiers’ Error
1. Description of XCS2. Description of the domain3. Experimentation4. XCS and class imbalances5. Guidelines for P. Tuning4.1. Classifiers Error g6. Online Adaptation7. Conclusions
How does the imbalance ratio influences the classifier’s error?XCS considers that a classifier is accurate if:XCS receives a reward of Rmax (correct prediction) or 0 (incorrect prediction)
0εε <cl
XCS computes classifiers’ error (ε) and prediction (p) as window averages:
P di ti ( )pRpp += β• Prediction:
• Error: ( )tttt pR εβεε −−+=+1
( )ttt pRpp −+=+ β1
Slide 13GRSI Enginyeria i Arquitectura la Salle
4. XCS and Class Imbalances 4 1 Classifiers’ Error
1. Description of XCS2. Description of the domain3. Experimentation4. XCS and class imbalances5. Guidelines for P. Tuning4.1. Classifiers Error
Until which class imbalance will XCS detect overgeneral
g6. Online Adaptation7. Conclusions
Until which class imbalance will XCS detect overgeneral classifiers?
– Bound for inaccurate classifier: 0εε ≥– Given the estimated prediction and error:
0
))(1(||)(||))(1()( minmax
lPRPlPRPRclPRclPP cc −+=
Overgeneral classifiersdetected
– We derive:))(1(||)(|| minmax clPRPclPRP cc −−+−=ε
0)(2 002 ≥−−+− εεε Rpp
0εε ≥
– where
0)(2 00max ≥+ εεε RppoOvergeneral classifiers
not detected1/1998 1998
– For CCp /!=
11000R– we get maximum imbalance ratio:
1998=ir
11000 0max == εR
10=i
0)(2 00max2 ≥−−+− εεε Rppo irmax = 1998
Slide 14GRSI Enginyeria i Arquitectura la Salle
1998max =ir 10max =iimax = 10
4. XCS and Class Imbalances 4 1 Classifiers’ Error
1. Description of XCS2. Description of the domain3. Experimentation4. XCS and class imbalances5. Guidelines for P. Tuning4.1. Classifiers Error
XCS computes classifiers’ error (ε) and prediction (p) as
g6. Online Adaptation7. Conclusions
XCS computes classifiers’ error (ε) and prediction (p) as window averages:– Prediction: ( )ttt pRpp −+=+ β1
Size of the windowPrediction:
– Error: ( )tttt pR εβεε −−+=+1
( )ttt pRpp ++ β1ew
ard
nce
of th
e re
β=0.2 The effect of previousrewards is forgotten
Influ
en
β=0.1
β=0.05
Slide 15GRSI Enginyeria i Arquitectura la Salle
timet+8t+1 t+2 t+3 t+4 t+5 t+6 t+7
4. XCS and Class Imbalances 4 2 Stability of Prediction and Error Estimates
1. Description of XCS2. Description of the domain3. Experimentation4. XCS and class imbalances5. Guidelines for P. Tuning4.2. Stability of Prediction and Error Estimates
S f f
g6. Online Adaptation7. Conclusions
Stability of Prediction and Error for ir=128:1
0.8
.30.
4
992.247.75β
= 0.
2
Den
sity
0.4
0.6
Den
sity
.10.
20.
3
β D
0.0
0.2
0 20 40 60 80 100
0.0
0.1
As ir increases β should be decreased
Prediction
900 920 940 960 980 1000Error
0 20 40 60 80 100
0.200.12
992.247.75
As ir increases, β should be decreasedto stabilize the prediction and error estimates
Den
sity
50.
100.
15
Den
sity
.04
0.08
= 0.
002
900 920 940 960 980 10000.
000.
05
D
0 20 40 60 80 100
0.00
0.04
β=
Slide 16GRSI Enginyeria i Arquitectura la Salle
Prediction
900 920 940 960 980 1000
Error
0 20 40 60 80 100
4. XCS and Class Imbalances 4 3 Occurrence based Reproduction
1. Description of XCS2. Description of the domain3. Experimentation4. XCS and class imbalances5. Guidelines for P. Tuning4.3. Occurrence-based Reproduction
To receive a GA event a classifier has to belong to [A]
g6. Online Adaptation7. Conclusions
To receive a GA event, a classifier has to belong to [A]Frequency of occurrences
Classifier pocc 11-Mux ir=128:0
000 0#######:0 0.062
iirp selocc = + 12
11
0.4
0.5
000 0#######:0000 1#######:1
### ########:0/1
000 1#######:1 0.000484
irp selocc ++ 12 1
irp selocc +
= + 11
21
1 0.2
0.3### ########:0/1
### ########:0 ½ 0.5
### ########:1 ½ 0.5 0
0.1
0 100 200 300 400 500
Classifiers that occur more frequently:– Have better estimates
0 100 200 300 400 500ir
– Tend to have more genetic opportunities…… depending on θGA
Slide 17GRSI Enginyeria i Arquitectura la Salle
4. XCS and Class Imbalances 4 3 Occurrence based Reproduction
1. Description of XCS2. Description of the domain3. Experimentation4. XCS and class imbalances5. Guidelines for P. Tuning4.3. Occurrence-based Reproduction
Genetic opportunities
g6. Online Adaptation7. Conclusions
Genetic opportunities– A classifier goes through a genetic event when (TGA):
• It occurs in [A]• Average time since last GA application > θGA
T (########### 0/1)
Tocc
TGA(###########:0/1)
GA GA GA GA
θGA25 θGA
50 θGA75 θGA
100
TGA(0001#######:1)
Set θGA = Tocc of the most infrequent nicheTo balance the genetic opportunities
that receive the different niches
ToccGA
Slide 18GRSI Enginyeria i Arquitectura la Salle
θGA
5. Guidelines for Parameter Tuning1. Description of XCS2. Description of the domain3. Experimentation4. XCS and class imbalances5. Guidelines for P. Tuningg g6. Online Adaptation7. Conclusions
From the analysis we can extract the following guidelinesRmax and ε0 determine the threshold between negligible noise and max 0 g gimbalance ratio
β t th d f tf l ti W t thi ti tβ represents the reward forgetfulness ratio. We want this ratio to consider under-sampled instances:
f i
majffk min
1=β
θGA is the GA rate when Tocc < θGA. If we want that all niches receive the same number of genetic opportunities:
min2
1f
kGA =θ
Slide 19GRSI Enginyeria i Arquitectura la Salle
minf
5. Guidelines for Parameter Tuning1. Description of XCS2. Description of the domain3. Experimentation4. XCS and class imbalances5. Guidelines for P. Tuningg g6. Online Adaptation7. Conclusions
We set β={0.04,0.02,0.01,0.005} and θGA={200,400,800,800,1600}
Standard Configuration Configuration following the guidelinesStandard Configuration Configuration following the guidelines
ir = 16:1 ir = 32:1 ir = 64:1 ir = 64:1 ir = 128:1 ir = 256:1
Slide 20GRSI Enginyeria i Arquitectura la Salle
6. Online Adaptation1. Description of XCS2. Description of the domain3. Experimentation4. XCS and class imbalances5. Guidelines for P. Tuningp g6. Online Adaptation7. Conclusions
Problem: How can we estimate the niche frequency?f
f maj– In the multiplexer:
– In a real-world problem
irf
f maj=min
In a real world problem…… niche frequencies may not be related to imbalance ratio
small disjuncts
ir = 5 in both figures
Slide 21GRSI Enginyeria i Arquitectura la Salle
6. Online Adaptation1. Description of XCS2. Description of the domain3. Experimentation4. XCS and class imbalances5. Guidelines for P. Tuningp g6. Online Adaptation7. Conclusions
Our approach: Let XCS discover small disjuncts.
We search for regions that promote overgeneral classifiers
We estimate ir based on that regionsWe estimate ircl based on that regions
We use ircl to adapt β and θGA Overgeneral classifierircl = 14:1
Slide 22GRSI Enginyeria i Arquitectura la Salle
6. Online Adaptation1. Description of XCS2. Description of the domain3. Experimentation4. XCS and class imbalances5. Guidelines for P. Tuningp g6. Online Adaptation7. Conclusions
The Algorithm
Checking if prediction oscillates
Estimating the imbalance ratio
Requiring a minimum of experience and numerosity
to adapt the parametersto adapt the parameters
Adapting parametersfollowing the guidelines and
the estimation of θGAGA
Slide 23GRSI Enginyeria i Arquitectura la Salle
6. Online Adaptation1. Description of XCS2. Description of the domain3. Experimentation4. XCS and class imbalances5. Guidelines for P. Tuningp g6. Online Adaptation7. Conclusions
Standard Configuration Online AdaptationConfiguration following the guidelinesStandard Configuration Online AdaptationConfiguration following the guidelines
ir = 64:1 ir = 128:1 ir = 256:1ir = 16:1 ir = 32:1 ir = 64:1ir = 64:1 ir = 128:1 ir = 256:1
Slide 24GRSI Enginyeria i Arquitectura la Salle
7. Conclusions1. Description of XCS2. Description of the domain3. Experimentation4. XCS and class imbalances5. Guidelines for P. Tuningg6. Online Adaptation7. Conclusions
We studied the behavior of XCS when the training set is unbalanced
XCS with standard configuration only can solve the multiplexer for an imbalance ratio up to ir=16p
The theoretical analysis denotes that XCS is highly robust to class imbalances if:class imbalances if:
– Classifier estimates are accurate
N b f ti t iti f i h i b l d– Number of genetic opportunities of niches is balanced
We define guidelines to adapt XCS’s parameters:
– XCS could solve the multiplexer until an imbalance ratio ir=256
Slide 25GRSI Enginyeria i Arquitectura la Salle
7. Conclusions1. Description of XCS2. Description of the domain3. Experimentation4. XCS and class imbalances5. Guidelines for P. Tuningg6. Online Adaptation7. Conclusions
As an advantage to other learners, XCS can automatically discover small disjuncts:
Self-adaptationof parameters
Slide 26GRSI Enginyeria i Arquitectura la Salle
7. Further Work1. Description of XCS2. Description of the domain3. Experimentation4. XCS and class imbalances5. Guidelines for P. Tuningg6. Online Adaptation7. Conclusions
What about the convergence time?– An increase θGA A decrease of search for promising rulesGA p g
Cluster-based resampling methods…… unfortunately, there is no a direct relation between cluster and niche
What about niche-based resampling?
i 14 1irniche = 14:1
Let’s resampleLet s resamplethese instances 1/irniche
Slide 27GRSI Enginyeria i Arquitectura la Salle
Bounding XCS’s Parameters f U b l d D t tfor Unbalanced Datasets
Albert Orriols-PuigEster Bernadó-Mansilla
Research Group in Intelligent SystemsEnginyeria i Arquitectura La Salle
Ramon Llull UniversityBarcelona, Spain, p