Upload
whitney-ross
View
220
Download
1
Embed Size (px)
Citation preview
Maria-Florina Balcan
Active Learning
Maria Florina Balcan
Lecture 26th
Active Learning
A Label for that Example
Request for the Label of an Example
A Label for that Example
Request for the Label of an Example
Data Source
Unlabeled examples
. . .
Algorithm outputs a classifier
Learning Algorithm
Expert / Oracle
• The learner can choose specific examples to be labeled. • He works harder, to use fewer labeled examples.
Maria-Florina Balcan
• Choose the label requests carefully, to get informative labels.
What Makes a Good Algorithm?
• Guaranteed to output a relatively good classifier for most learning problems.
• Doesn’t make too many label requests.
Maria-Florina Balcan
Can It Really Do Better Than Passive?
• YES! (sometimes)
• We often need far fewer labels for active learning than for passive.
• This is predicted by theory and has been observed in practice.
Can adaptive querying help? [CAL92, Dasgupta04]
• Threshold fns on the real line:
w
+-
Exponential improvement.
hw(x) = 1(x ¸ w), C = {hw: w 2 R}
• Sample with 1/ unlabeled examples; do binary search.-
• Binary search – need just O(log 1/) labels.
Active: only O(log 1/) labels.
Passive supervised: (1/) labels to find an -accurate threshold.
+-
Active Algorithm
Other interesting results as well.
Maria-Florina Balcan
Active Learning might not help [Dasgupta04]
C = {linear separators in R1}: active learning reduces sample complexity substantially.
In this case: learning to accuracy requires 1/ labels…
In general, number of queries needed depends on C and also on D.
C = {linear separators in R2}: there are some target hyp. for which no improvement can be achieved! - no matter how benign the input distr.
h1
h2
h3
h0
Maria-Florina Balcan
Examples where Active Learning helps
• C = {linear separators in R1}: active learning reduces sample complexity substantially no matter what is the input distribution.
In general, number of queries needed depends on C and also on D.
• C - homogeneous linear separators in Rd, D - uniform distribution over unit sphere: • need only d log 1/ labels to find a hypothesis
with error rate < .
• Freund et al., ’97.
• Dasgupta, Kalai, Monteleoni, COLT 2005
• Balcan-Broder-Zhang, COLT 07
Maria-Florina Balcan
Region of uncertainty [CAL92]
current version space
• Example: data lies on circle in R2 and hypotheses are homogeneous linear separators.
region of uncertainty in data space
• Current version space: part of C consistent with labels so far.• “Region of uncertainty” = part of data space about which there is still some uncertainty (i.e. disagreement within version space)
++
Maria-Florina Balcan
Region of uncertainty [CAL92]
Pick a few points at random from the current region of uncertainty and query their labels.
current version space
region of uncertainy
Algorithm:
Maria-Florina Balcan
Region of uncertainty [CAL92]
• Current version space: part of C consistent with labels so far.• “Region of uncertainty” = part of data space about which there is still some uncertainty (i.e. disagreement within version space)
current version space
region of uncertainty in data space
++
Maria-Florina Balcan
Region of uncertainty [CAL92]
• Current version space: part of C consistent with labels so far.• “Region of uncertainty” = part of data space about which there is still some uncertainty (i.e. disagreement within version space)
new version space
New region of uncertainty in data space
++
Maria-Florina Balcan
Region of uncertainty [CAL92], Guarantees
Algorithm: Pick a few points at random from the current region of uncertainty and query their labels.
• C - homogeneous linear separators in Rd, D -uniform distribution over unit sphere.
• low noise, need only d2 log 1/ labels to find a hypothesis with error rate < .• realizable case, d3/2 log 1/ labels.
•supervised -- d/ labels.
[Balcan, Beygelzimer, Langford, ICML’06]
Analyze a version of this alg. which is robust to noise.
• C- linear separators on the line, low noise, exponential improvement.
Maria-Florina Balcan
Margin Based Active-Learning Algorithm
Use O(d) examples to find w1 of error 1/8.
iterate k=2, … , log(1/) • rejection sample mk samples x from D
satisfying |wk-1T ¢ x| · k ;
• label them;• find wk 2 B(wk-1, 1/2k ) consistent with all these
examples.end iterate
[Balcan-Broder-Zhang, COLT 07]
wk
wk+
1
γk
w*
Maria-Florina Balcan
Margin Based Active-Learning, Realizable Case
u
v
(u,v)
Theorem
If then after
ws has error · .
PX is uniform over Sd.
and
iterations
Fact 1
Fact 2v
Maria-Florina Balcan
Margin Based Active-Learning, Realizable Case
u
v
(u,v)
If and
Theorem
If then after
ws has error · .
PX is uniform over Sd.
and
iterations
Fact 1
Fact 3vu
v
Maria-Florina Balcan
BBZ’07, Proof Ideaiterate k=2, … , log(1/)
Rejection sample mk samples x from D satisfying |wk-1
T ¢ x| · k ;ask for labels and find wk 2 B(wk-1, 1/2k ) consistent with all these examples.
end iterate
Assume wk has error · . We are done if 9 k s.t. wk+1 has error · /2
and only need O(d log( 1/)) labels in round k.
wk
wk+1
γk
w*
Maria-Florina Balcan
BBZ’07, Proof Ideaiterate k=2, … , log(1/)
Rejection sample mk samples x from D satisfying |wk-1
T ¢ x| · k ;ask for labels and find wk 2 B(wk-1, 1/2k ) consistent with all these examples.
end iterate
Assume wk has error · . We are done if 9 k s.t. wk+1 has error · /2
and only need O(d log( 1/)) labels in round k.
wk
wk+1
γk
w*
Maria-Florina Balcan
BBZ’07, Proof Ideaiterate k=2, … , log(1/)
Rejection sample mk samples x from D satisfying |wk-1
T ¢ x| · k ;ask for labels and find wk 2 B(wk-1, 1/2k ) consistent with all these examples.
end iterate
Assume wk has error · . We are done if 9 k s.t. wk+1 has error · /2
and only need O(d log( 1/)) labels in round k.
Key Point
Under the uniform distr. assumption for
we have · /4 wk
wk+1
γk
w*
Maria-Florina Balcan
BBZ’07, Proof Idea
Key Point
Under the uniform distr. assumption for
we have · /4
So, it’s enough to ensure that
We can do so by only using O(d log( 1/)) labels in round k.
wk
wk+1
γk
w*
Key Point