GECCO'2006: Bounding XCS’s Parameters for Unbalanced Datasets

Bounding XCS’s Parameters f U b l d D t tfor Unbalanced Datasets

Albert Orriols-PuigEster Bernadó-Mansilla

Research Group in Intelligent SystemsEnginyeria i Arquitectura La Salle

Ramon Llull UniversityBarcelona, Spain, p

Framework

New instance

Learner M d l

Information basedon experience

Knowledgeextraction

New instance

Dataset Learner Model

Consisting

Examples

Counter-examples

Predicted Output

ofCou te e a p es

In real-world domains, typically:, yp yHigher cost to obtain examples of the concept to be learntSo, distribution of examples in the training dataset is usually unbalanced

Applications:Fraud DetectionRare medical diagnosis

Slide 2GRSI Enginyeria i Arquitectura la Salle

Rare medical diagnosisDetection of oil spills in satellite images

Framework

Do learners suffer from class imbalances?

L Minimize theTrainingLearner Minimize the

global errorSet

examplesnumbererrorsnumerrorsnum

error cc 21 .. +=Biased towards

the overwhelmed class

Maximization of the overwhelmed class accuracy,in detriment of the minority class.


Aim

Analyze the performance of XCS when learning from imbalanced datasets

Analyze the contribution of the different components

P h th t f ilit t t lPropose approaches that facilitate to learn minority class regions


Outline

1. Description of XCS

2. Description of the Domain

3 E i t ti3. Experimentation

4 XCS and Class Imbalances4. XCS and Class Imbalances

5. Guidelines for Parameter Tuning

6. Online Adaptation

7. Conclusions


1. Description of XCS1. Description of XCS2. Description of the domain3. Experimentation4. XCS and class imbalances5. Guidelines for P. Tuningp

In single-step tasks:

g6. Online Adaptation7. Conclusions

Environment

g p

Problem instance

1 C A P ε F num as ts exp3 C A P ε F num as ts exp5 C A P ε F num as ts exp

Match Set [M]Selected

action


Population [P] Match set generation

5 C A P ε F num as ts exp6 C A P ε F num as ts exp

…Prediction Array

REWARD


…

A ti S t [A]

c1 c2 … cn

Random Action


C

Action Set [A]

Selection, Reproduction, Mutation

Deletion ClassifierParameters

Update6 C A P ε F num as ts exp…Genetic Algorithm

Update


1. Description of XCS1. Description of XCS2. Description of the domain3. Experimentation4. XCS and class imbalances5. Guidelines for P. Tuningp g6. Online Adaptation7. Conclusions

Learning domain

Environment

Prediction RewardPrediction Reward

Set of Rules

GA ReinforcementLearning

R ti b t l 525 75Ratio between classes 525:75

1 minority class example

7 majority class examples


j y p

2. Description of the Domain1. Description of XCS2. Description of the domain3. Experimentation4. XCS and class imbalances5. Guidelines for P. Tuningp g6. Online Adaptation7. Conclusions

Selection

(11-bit) Multiplexer Imbalanced MultiplexerSelection

bitsPosition

bits

Example:

Complexity related to the

000 10010100:1

Co p e y e a ed o enumber of selection bitsCompletely balanced

•We under-sampled class 1

•ir: Proportion between majority andXCS should evolve:

000 0#######:0 000 0#######:1 000 1#######:0 000 1#######:1

ir: Proportion between majority and minority class instances

•i: imbalance level (i=log ir)001 #0######:0 001 #0######:1 001 #1######:0 001 #1######:1

010 ##0#####:0 010 ##0#####:1 010 ##1#####:0 010 ##1#####:1

011 ###0####:0 011 ###0####:1 011 ###1####:0 011 ###1####:1

•i: imbalance level (i=log2ir)

100 ####0###:0 100 ####0###:1 100 ####1###:0 100 ####1###:1

101 #####0##:0 101 #####0##:1 101 #####1##:0 101 #####1##:1

110 ######0#:0 110 ######0#:1 110 ######1#:0 110 ######1#:1

111 #######0:0 111 #######0:1 111 #######1:0 111 #######1:1


111 #######0:0 111 #######0:1 111 #######1:0 111 #######1:1

3. Experimentation1. Description of XCS2. Description of the domain3. Experimentation4. XCS and class imbalances5. Guidelines for P. Tuningp g6. Online Adaptation7. Conclusions

We ran XCS with the following standard configuration fromWe ran XCS with the following standard configuration from i=0 (ir=1) to i=9 (ir=512:1):

N=800, α=0.1, ν=5, Rmax = 1000, ε0=1, θGA=25, β=0.2,χ=0.8, μ=0.4, θdel=20, δ=0.1, θsub=200, P#=0.6

selection=rws mutation=nichedselection=rws, mutation=niched,GAsub=true, [A]sub=false



True Negative rate True Positive rate

i 16 1ir = 32:1

i 64 1ir = 16:1 ir = 64:1



Most numerous rules, ir=128:1

Condition:Action P Error F Num

###########:0 1000 0.120 0.98 385

###########:1 1.2 · 10-4 0.074 0.98 366

Estimated parametersare too high. Theoretically:

P:0 = 992.24 P:1 = 15.38P:0 992.24 P:1 15.38ε0:0 = ε0:1 = 7.75

Overgeneral classifiersovertake the population

(they represent the 94%of the population)


4. XCS and Class Imbalances1. Description of XCS2. Description of the domain3. Experimentation4. XCS and class imbalances5. Guidelines for P. Tuningg6. Online Adaptation7. Conclusions

We analyze the following factors:

Classifiers’ ErrorClassifiers Error

Stability of Prediction and Error Estimatesy

Occurrence-based Reproduction


4. XCS and Class Imbalances 4 1 Classifiers’ Error

1. Description of XCS2. Description of the domain3. Experimentation4. XCS and class imbalances5. Guidelines for P. Tuning4.1. Classifiers Error g6. Online Adaptation7. Conclusions

How does the imbalance ratio influences the classifier’s error?XCS considers that a classifier is accurate if:XCS receives a reward of Rmax (correct prediction) or 0 (incorrect prediction)

0εε <cl

XCS computes classifiers’ error (ε) and prediction (p) as window averages:

P di ti ( )pRpp += β• Prediction:

• Error: ( )tttt pR εβεε −−+=+1

( )ttt pRpp −+=+ β1



1. Description of XCS2. Description of the domain3. Experimentation4. XCS and class imbalances5. Guidelines for P. Tuning4.1. Classifiers Error

Until which class imbalance will XCS detect overgeneral


Until which class imbalance will XCS detect overgeneral classifiers?

– Bound for inaccurate classifier: 0εε ≥– Given the estimated prediction and error:

0

))(1(||)(||))(1()( minmax

lPRPlPRPRclPRclPP cc −+=

Overgeneral classifiersdetected

– We derive:))(1(||)(|| minmax clPRPclPRP cc −−+−=ε

0)(2 002 ≥−−+− εεε Rpp

0εε ≥

– where

0)(2 00max ≥+ εεε RppoOvergeneral classifiers

not detected1/1998 1998

– For CCp /!=

11000R– we get maximum imbalance ratio:

1998=ir

11000 0max == εR

10=i

0)(2 00max2 ≥−−+− εεε Rppo irmax = 1998


1998max =ir 10max =iimax = 10


1. Description of XCS2. Description of the domain3. Experimentation4. XCS and class imbalances5. Guidelines for P. Tuning4.1. Classifiers Error

XCS computes classifiers’ error (ε) and prediction (p) as


XCS computes classifiers’ error (ε) and prediction (p) as window averages:– Prediction: ( )ttt pRpp −+=+ β1

Size of the windowPrediction:

– Error: ( )tttt pR εβεε −−+=+1

( )ttt pRpp ++ β1ew

ard

nce

of th

e re

β=0.2 The effect of previousrewards is forgotten

Influ

en

β=0.1

β=0.05


timet+8t+1 t+2 t+3 t+4 t+5 t+6 t+7

4. XCS and Class Imbalances 4 2 Stability of Prediction and Error Estimates

1. Description of XCS2. Description of the domain3. Experimentation4. XCS and class imbalances5. Guidelines for P. Tuning4.2. Stability of Prediction and Error Estimates

S f f


Stability of Prediction and Error for ir=128:1

0.8

.30.

4

992.247.75β

= 0.

2

Den

sity

0.4

0.6

Den

sity

.10.

20.

3

β D

0.0

0.2

0 20 40 60 80 100

0.0

0.1

As ir increases β should be decreased

Prediction

900 920 940 960 980 1000Error

0 20 40 60 80 100

0.200.12

992.247.75

As ir increases, β should be decreasedto stabilize the prediction and error estimates

Den

sity

50.

100.

15

Den

sity

.04

0.08

= 0.

002

900 920 940 960 980 10000.

000.

05

D

0 20 40 60 80 100

0.00

0.04

β=


Prediction

900 920 940 960 980 1000

Error

0 20 40 60 80 100

4. XCS and Class Imbalances 4 3 Occurrence based Reproduction

1. Description of XCS2. Description of the domain3. Experimentation4. XCS and class imbalances5. Guidelines for P. Tuning4.3. Occurrence-based Reproduction

To receive a GA event a classifier has to belong to [A]


To receive a GA event, a classifier has to belong to [A]Frequency of occurrences

Classifier pocc 11-Mux ir=128:0

000 0#######:0 0.062

iirp selocc = + 12

11

0.4

0.5

000 0#######:0000 1#######:1

### ########:0/1

000 1#######:1 0.000484

irp selocc ++ 12 1

irp selocc +

= + 11

21

1 0.2

0.3### ########:0/1

### ########:0 ½ 0.5

### ########:1 ½ 0.5 0

0.1

0 100 200 300 400 500

Classifiers that occur more frequently:– Have better estimates

0 100 200 300 400 500ir

– Tend to have more genetic opportunities…… depending on θGA


4. XCS and Class Imbalances 4 3 Occurrence based Reproduction

1. Description of XCS2. Description of the domain3. Experimentation4. XCS and class imbalances5. Guidelines for P. Tuning4.3. Occurrence-based Reproduction

Genetic opportunities


Genetic opportunities– A classifier goes through a genetic event when (TGA):

• It occurs in [A]• Average time since last GA application > θGA

T (########### 0/1)

Tocc

TGA(###########:0/1)

GA GA GA GA

θGA25 θGA

50 θGA75 θGA

100

TGA(0001#######:1)

Set θGA = Tocc of the most infrequent nicheTo balance the genetic opportunities

that receive the different niches

ToccGA


θGA

5. Guidelines for Parameter Tuning1. Description of XCS2. Description of the domain3. Experimentation4. XCS and class imbalances5. Guidelines for P. Tuningg g6. Online Adaptation7. Conclusions

From the analysis we can extract the following guidelinesRmax and ε0 determine the threshold between negligible noise and max 0 g gimbalance ratio

β t th d f tf l ti W t thi ti tβ represents the reward forgetfulness ratio. We want this ratio to consider under-sampled instances:

f i

majffk min

1=β

θGA is the GA rate when Tocc < θGA. If we want that all niches receive the same number of genetic opportunities:

min2

1f

kGA =θ


minf

5. Guidelines for Parameter Tuning1. Description of XCS2. Description of the domain3. Experimentation4. XCS and class imbalances5. Guidelines for P. Tuningg g6. Online Adaptation7. Conclusions

We set β={0.04,0.02,0.01,0.005} and θGA={200,400,800,800,1600}

Standard Configuration Configuration following the guidelinesStandard Configuration Configuration following the guidelines

ir = 16:1 ir = 32:1 ir = 64:1 ir = 64:1 ir = 128:1 ir = 256:1


6. Online Adaptation1. Description of XCS2. Description of the domain3. Experimentation4. XCS and class imbalances5. Guidelines for P. Tuningp g6. Online Adaptation7. Conclusions

Problem: How can we estimate the niche frequency?f

f maj– In the multiplexer:

– In a real-world problem

irf

f maj=min

In a real world problem…… niche frequencies may not be related to imbalance ratio

small disjuncts

ir = 5 in both figures



Our approach: Let XCS discover small disjuncts.

We search for regions that promote overgeneral classifiers

We estimate ir based on that regionsWe estimate ircl based on that regions

We use ircl to adapt β and θGA Overgeneral classifierircl = 14:1



The Algorithm

Checking if prediction oscillates

Estimating the imbalance ratio

Requiring a minimum of experience and numerosity

to adapt the parametersto adapt the parameters

Adapting parametersfollowing the guidelines and

the estimation of θGAGA



Standard Configuration Online AdaptationConfiguration following the guidelinesStandard Configuration Online AdaptationConfiguration following the guidelines

ir = 64:1 ir = 128:1 ir = 256:1ir = 16:1 ir = 32:1 ir = 64:1ir = 64:1 ir = 128:1 ir = 256:1


7. Conclusions1. Description of XCS2. Description of the domain3. Experimentation4. XCS and class imbalances5. Guidelines for P. Tuningg6. Online Adaptation7. Conclusions

We studied the behavior of XCS when the training set is unbalanced

XCS with standard configuration only can solve the multiplexer for an imbalance ratio up to ir=16p

The theoretical analysis denotes that XCS is highly robust to class imbalances if:class imbalances if:

– Classifier estimates are accurate

N b f ti t iti f i h i b l d– Number of genetic opportunities of niches is balanced

We define guidelines to adapt XCS’s parameters:

– XCS could solve the multiplexer until an imbalance ratio ir=256


7. Conclusions1. Description of XCS2. Description of the domain3. Experimentation4. XCS and class imbalances5. Guidelines for P. Tuningg6. Online Adaptation7. Conclusions

As an advantage to other learners, XCS can automatically discover small disjuncts:

Self-adaptationof parameters


7. Further Work1. Description of XCS2. Description of the domain3. Experimentation4. XCS and class imbalances5. Guidelines for P. Tuningg6. Online Adaptation7. Conclusions

What about the convergence time?– An increase θGA A decrease of search for promising rulesGA p g

Cluster-based resampling methods…… unfortunately, there is no a direct relation between cluster and niche

What about niche-based resampling?

i 14 1irniche = 14:1

Let’s resampleLet s resamplethese instances 1/irniche


Bounding XCS’s Parameters f U b l d D t tfor Unbalanced Datasets

Albert Orriols-PuigEster Bernadó-Mansilla

Research Group in Intelligent SystemsEnginyeria i Arquitectura La Salle

Ramon Llull UniversityBarcelona, Spain, p

Education

GECCO'2006: Bounding XCS’s Parameters for Unbalanced Datasets