Machine Learning Week 4 Lecture 1. Hand In Data Is coming online later today. I keep test set with...
Preview:
Citation preview
- Slide 1
- Machine Learning Week 4 Lecture 1
- Slide 2
- Hand In Data Is coming online later today. I keep test set with
approx. 1000 test images That will be your real test You are most
welcome to add regularization as we discussed last week. It is not
a requirement. Hand in Version 4 available
- Slide 3
- Recap What is going on Ways to fix it
- Slide 4
- Overfitting Data Increases -> Overfitting Decreases Noise
Increases -> Overfitting Increases Target Complexity Increase
-> Overfitting Increases
- Slide 5
- Learning Theory Perspective In Sample Error + Model Complexity
Instead of picking simpler hypothesis set Prefer Simpler hypotheses
h from Define what simple means in complexity measure Minimize
- Slide 6
- Regularization In Sample Error + Model Complexity Weight Decay
Decay Every round we take a step towards the zero vector
- Slide 7
- Why are small weights better Practical Perspective Because in
practice we believe that Noise is Noisy Stochastic Noise High
Frequency Deterministic Noise also non-smooth Sometimes weight are
weighed differently Bias Term gets free ride
- Slide 8
- Regularization Summary More Art Than Science Use VC and Bias
Variance as guides Weight Decay universal technique practical
believe that noise is noisy (non-smooth) Question. Which to use
Many other regularizers exist. Extremely Important. Quote Book:
Necessary Evil
- Slide 9
- Validation Regularization Estimates Validation Estimate
Remember the test set
- Slide 10
- Model Selection t Models m 1,,m t Which is better? E val (m 1 )
E val (m 2 ). E val (m t ) Pick the minimum one Compute Train on D
train Validate on D val Use to find for my weight decay
- Slide 11
- Cross Validation Increasing K Dilemma E val estimate tightens E
val increases Small K Large K We would like to have both. Cross
Validation
- Slide 12
- K-Fold Cross Validation Split Data in N/K Parts of size K Test
Train all but one set. Test on remaining. Pick one who is best on
average over N/K partitions Usual K = N/10 (we do not have all
day)
- Slide 13
- Today: Support Vector Machines Margins Intuition Optimization
Problem Convex Optimization Lagrange Multipliers Lagrange for SVM
WARNING: Linear Algebra and function analysis coming up
- Slide 14
- Support Vector Machines Today Next Time
- Slide 15
- Notation Target y is in {-1,+1} We write parameters as w and b
The hyperplane we consider is w T x + b = 0 Data D = {x i,y i ) For
now assume D is linear separable 0 for some i maximize over i >0
then i g i (x) is unbounded h i (x) 0 for some i maximize
ove">
- Primal Problem If x is primal infeasible: g i (x) >0 for
some i maximize over i >0 then i g i (x) is unbounded h i (x) 0
for some i maximize over then i h i (x) is unbounded x is primal
infeasible if g i (x) < 0 for some i or h i (x) 0 for some i
Primal Problem
- Slide 31
- If x is primal feasible: g i (x) 0 for all i maximize over i 0
then optimal is i =0 h i (x) = 0 for all i maximize over then i h i
(x) = 0, is irrelevant
- Slide 32
- Primal Problem Made constraints into value in optimization
function Which is what we are looking for!!! is an optimal x
- Slide 33
- Dual Problem , are dual feasible if i 0 for all i This
implies
- Slide 34
- Weak and Strong Duality Question: When are they equal?
- Slide 35
- Strong Duality: Slaters Condition If f,g i are convex and h i
is affine and the problem is strictly feasible e.g. exist primal
feasible x such g i (x) < 0 for all i then d* = p * (strong
duality) Assume that is the case
- Slide 36
- Complementary Slackness Let x* be primal optimal *,* dual
optimal (p*=d*) All Non-Negative for all i Complimentary
Slackness
- Slide 37
- Karush-Kuhn-Tucker (KKT) Conditions Let x* be primal optimal
*,* dual optimal (p*=d*) g i (x*) 0, for all i i * 0 for all i i *
g i (x*) = 0 for all i h i (x*) = 0 for all i Primal Feasibility
Dual Feasibility Complementary Slackness Since x* minimizes
Stationary KKT Conditions for optimality, necessary and
sufficient.
- Slide 38
- Finally Back To SVM Subject To Minimize Define the Lagrangian
(no required)
- Slide 39
- SVM Dual Form Need to minimize. We take derivatives and solve
for 0 Solve for 0 w is a vector that is a specific linear
combination of input points
- Slide 40
- SVM Dual Form Which must be 0. We get constraint
- Slide 41
- SVM Dual Form Insert Above
- Slide 42
- SVM Dual Form Insert Above
- Slide 43
- SVM Dual Form
- Slide 44
- SVM Dual Problem Found the minimum over w,b now maximize over
Subject To Remember
- Slide 45
- Intercerpt b* Case: y i = 1 Cases: y i =-1 Constraint
- Slide 46
- Making Predictions Sign of Support Vectors
- Slide 47
- w Complementary Slackness Support vectors are the vectors that
support the plane
- Slide 48
- SVM Summary Subject To Support Vectors w