Active learning: Scenarios and techniques

ACTIVE LEARNINGScenarios and techniques

ABBASSI SABER [email protected]

Outline

■ Scenarios

■ Query Strategy Frameworks (techniques)

Scenarios

■ Membership Query Synthesis

■ Stream-Based Selective Sampling

■ Pool-Based Sampling

Scenarios

Scenarios

■ Membership Query Synthesis

Scenarios

■ Stream-Based Selective Sampling

Yes/ No ?

Scenarios

■ Pool-Based Sampling

Query Strategy Frameworks

1- Uncertainty Sampling:

■ Uncertainty Sampling:

The Active learner queries the instances about which it is least certain how to label.

Three variants: least confident, margin sampling and entropy


Least Confident:■ Is about selecting the instance whose prediction is the least confident :

Where:

(The most popular class label for the model Teta)


Least Confident:■ The con is that least confident considers only information about the most probable

label. So other remaining labels will not be treated.


Margin sampling:■ Aims to correct for a shortcoming in least confident strategy.

■ Incorporate the posterior of the second most likely label.

■ Instances with large margin are easy since the model has little doubt in differentiating between the two most likely classes.


Margin sampling:■ Con is that if there lot of labels … back to the same problem as in Least Confident


Entropy:■ More general uncertainty sampling strategy is to select the instance that will bring you

most quantity of information:

■ Select the instance that having , overall labels, the highest entropy.


■ The most informative instances should be at the center of the triangle(where the posterior label distribution is uniform)

■ The least informative instances are at the three corners (where one of the classes has extremely high probability )

2- Query-By-Committee



■ Each member of committee votes for labeling query candidates

■ The most informative query is considered to be the instance about which they most disagree

■ To calculate the level of disagreement, two approaches:

– Vote Entropy

– Average Kullback-Leibler (KL) divergence

■ This is in information and probability theory a measure of the difference between two probability distributions P and Q.

■ is the expectation of the logarithmic difference between the probabilities P and Q


Vote Entropy

V(yi): number of votes received by a label (yi) from among the committee members’ predictions

C: Committee size


Average Kullback-Leibler (KL) divergence

Where:

And

X* is the averageof KL divergence

The consensus probability that the label yi is correct for xThe average of probabilities from different models

The probability that yiis the correct label for x using the model : c

3- Expected Model Change

■ Is about selecting the instance that would impart the greatest change to the current model if we knew its label.

■ As example the : The Expected Gradient Length, generally applied for problems where gradient-based training is used.

■ Con: this technique can be computationally expensive if both the feature space and set labeling are very large

The objective function

4- Expected Error Reduction

■ Measure how much the generalization error is likely to be reduced.


■ Measure how much the generalization error is likely to be reduced.

■ The problem is that it calculate the expected error for the most popular label, like if the correct labeling is only the popular label :p

The new model after it has been re-trained with the tuple <x,yi>

The most popular label


■ An other variant which is more “credible” is to reduce the expected total number of incorrect predictions.

■ Other way to understand it , is like choosing the instance that is expected to reduce the most the future entropy.

■ Not like the previous which consider only the popular label

5- Variance reduction

■ Instead of looking for a future minimal expected error (expensive and not in a closed form), look for future minimal variance

6- Density-Weighted Methods

■ The idea is that the informative instances should not only be those which are uncertain, but also those which “representative” of the underlying distribution.

The most uncertain but less representative

Not the most uncertain but more representative


■ Query instances as follow:


■ Query instances as follow:

Informativeness, using previous techniques (uncertainty, QBC…)

Similarity between x and x(u)

representativeness

Education

Active learning: Scenarios and techniques