Upload
rosalyn-greene
View
240
Download
0
Embed Size (px)
Citation preview
Semisupervised Learning
A brief introduction
Semisupervised Learning
• Introduction• Types of semisupervised learning• Paper for review• References
Semisupervised Learning: Introduction
This is an extension to supervised learning. We have two sets of data:
Motivation: labeled data is sometimes hard to obtain.
Figure obtained from X. Zhu. Semi-Supervised Learning Tutorial. ICML 2007
An example from Mars Data Analysis
Digital Elevation MapGeomorphic Map
Martian landscape
Manually drawn geomorphic map of this landscape
Geomorphic map shows landforms chosen and defined by a domain expert.
Segmentation
Segmentation: Results.
Displayed on an elevation background.
2631 segments homogeneous in slope, curvature and flood.
Classification: Labeling.
A representative subset of objects are labeled as one of the following six classes:o Plaino Crater Flooro Convex Crater Wallso Concave Crater Wallso Convex Ridgeso Concave Ridges
Labeled segments.
Semisupervised Learning: Introduction
How can we learn from unlabeled data at all?
The answer lies in the set of assumptions about theunlabeled data distribution.
If assumptions are right, an advantage can be obtainedusing unlabeled data
But a decrease in performance is possible if assumptions are incorrect.
Semisupervised Learning: Introduction
Figure obtained from X. Zhu. Semi-Supervised Learning Tutorial. ICML 2007
Semisupervised Learning: Introduction
Figure obtained from X. Zhu. Semi-Supervised Learning Tutorial. ICML 2007
Semisupervised Learning
• Introduction• Types of semisupervised learning• Paper for review• References
Semisupervised Learning: Types
Types of semi-supervised learning:
• Self-Training
• Generative Models
• Graph-Based Algorithms
• Multi-View Algorithms
• SVMs
Semisupervised Learning: Types
Self-Training
Figure obtained from X. Zhu. Semi-Supervised Learning Tutorial. ICML 2007
Semisupervised Learning: Types
Self-Training Variations
Figure obtained from X. Zhu. Semi-Supervised Learning Tutorial. ICML 2007
Semisupervised Learning: Types
Figure obtained from X. Zhu. Semi-Supervised Learning Tutorial. ICML 2007
Semisupervised Learning: Types
Figure obtained from X. Zhu. Semi-Supervised Learning Tutorial. ICML 2007
Semisupervised Learning: Types
Figure obtained from X. Zhu. Semi-Supervised Learning Tutorial. ICML 2007
Semisupervised Learning: Types
Figure obtained from X. Zhu. Semi-Supervised Learning Tutorial. ICML 2007
Semisupervised Learning: Types
Figure obtained from X. Zhu. Semi-Supervised Learning Tutorial. ICML 2007
Semisupervised Learning: Types
Figure obtained from X. Zhu. Semi-Supervised Learning Tutorial. ICML 2007
Semisupervised Learning: Types
Figure obtained from X. Zhu. Semi-Supervised Learning Tutorial. ICML 2007
Semisupervised Learning: Types
Graph-Based Models
Figure obtained from X. Zhu. Semi-Supervised Learning Tutorial. ICML 2007
Semisupervised Learning: TypesGraph-Based Models
Figure obtained from X. Zhu. Semi-Supervised Learning Tutorial. ICML 2007
Semisupervised Learning: Types
Multi-View Algorithms
Figure obtained from X. Zhu. Semi-Supervised Learning Tutorial. ICML 2007
Semisupervised Learning: Types
Multi-View Algorithms
Figure obtained from X. Zhu. Semi-Supervised Learning Tutorial. ICML 2007
Semisupervised Learning: Types
SVMs
Figure obtained from X. Zhu. Semi-Supervised Learning Tutorial. ICML 2007
Semisupervised Learning: Types
Figure obtained from X. Zhu. Semi-Supervised Learning Tutorial. ICML 2007
Semisupervised Learning: Types
A comparison of different approaches:
Self-Training: Method is simple, but early mistakes propagate down and can be very harmful.
Generative Models: great method if prob. model is correct, but it isdifficult to verify model correctness. In fact, unlabeled data may yielda decrease in accuracy if model is wrong.
Graph-Based Models: good solid mathematical solution if a graph is a good representative of the data distribution.
Multi-view Method: simple method that is less sensitive to errors inclassification, but there may not be a natural split of the features. SVMs: it can be used wherever SVM is applicable; but may fall into a local maxima, and optimization is hard.
Semisupervised Learning
• Introduction• Types of semisupervised learning• Paper for review• References
Semisupervised Learning: Paper
Unlabeled data: Now it helps, now it doesn’t by Singh, et. al.
Problem: analyze when semi-supervised learning helps to improve generalizationperformance.
Figure obtained from: Singh, et. al. Unlabeled data: now it helps, now it doesn’t. NIPS (2008).
Semisupervised Learning: Paper
Unlabeled data: Now it helps, now it doesn’t by Singh, et. al.
Some terminology that is necessary for the paper follows:
“The cluster assumption” means that the distributions of classes in the feature space is smooth on each set D ∈ D. The sets in D are called decision sets.
Figure obtained from: Singh, et. al. Unlabeled data: now it helps, now it doesn’t. NIPS (2008).
Semisupervised Learning: Paper
Unlabeled data: Now it helps, now it doesn’t by Singh, et. al.
Figure obtained from: Singh, et. al. Unlabeled data: now it helps, now it doesn’t. NIPS (2008).
Semisupervised Learning: Paper
Unlabeled data: Now it helps, now it doesn’t by Singh, et. al.
Text obtained from: Singh, et. al. Unlabeled data: now it helps, now it doesn’t. NIPS (2008).
Main Result: “…if the sets D are discernible using unlabeled data (the margin is large enough compared to average spacing between unlabeled data points), then there exists a semi-supervised learner that can perform as well as a supervised learner with clairvoyant knowledge of the decision sets, provided m ≫ n…”
Semisupervised Learning: Paper
Unlabeled data: Now it helps, now it doesn’t by Singh, et. al.
Figure obtained from: Singh, et. al. Unlabeled data: now it helps, now it doesn’t. NIPS (2008).
Semisupervised Learning
• Introduction• Types of semisupervised learning• Paper for review• References
Semisupervised Learning