Machine Learning Week 4 Lecture 1. Hand In Data Is coming online later today. I keep test set with...
If you can't read please download the document
Machine Learning Week 4 Lecture 1. Hand In Data Is coming online later today. I keep test set with approx. 1000 test images That will be your real test
Hand In Data Is coming online later today. I keep test set with
approx. 1000 test images That will be your real test You are most
welcome to add regularization as we discussed last week. It is not
a requirement. Hand in Version 4 available
Learning Theory Perspective In Sample Error + Model Complexity
Instead of picking simpler hypothesis set Prefer Simpler hypotheses
h from Define what simple means in complexity measure Minimize
Slide 6
Regularization In Sample Error + Model Complexity Weight Decay
Decay Every round we take a step towards the zero vector
Slide 7
Why are small weights better Practical Perspective Because in
practice we believe that Noise is Noisy Stochastic Noise High
Frequency Deterministic Noise also non-smooth Sometimes weight are
weighed differently Bias Term gets free ride
Slide 8
Regularization Summary More Art Than Science Use VC and Bias
Variance as guides Weight Decay universal technique practical
believe that noise is noisy (non-smooth) Question. Which to use
Many other regularizers exist. Extremely Important. Quote Book:
Necessary Evil
Slide 9
Validation Regularization Estimates Validation Estimate
Remember the test set
Slide 10
Model Selection t Models m 1,,m t Which is better? E val (m 1 )
E val (m 2 ). E val (m t ) Pick the minimum one Compute Train on D
train Validate on D val Use to find for my weight decay
Slide 11
Cross Validation Increasing K Dilemma E val estimate tightens E
val increases Small K Large K We would like to have both. Cross
Validation
Slide 12
K-Fold Cross Validation Split Data in N/K Parts of size K Test
Train all but one set. Test on remaining. Pick one who is best on
average over N/K partitions Usual K = N/10 (we do not have all
day)
Slide 13
Today: Support Vector Machines Margins Intuition Optimization
Problem Convex Optimization Lagrange Multipliers Lagrange for SVM
WARNING: Linear Algebra and function analysis coming up
Slide 14
Support Vector Machines Today Next Time
Slide 15
Notation Target y is in {-1,+1} We write parameters as w and b
The hyperplane we consider is w T x + b = 0 Data D = {x i,y i ) For
now assume D is linear separable 0 for some i maximize over i >0
then i g i (x) is unbounded h i (x) 0 for some i maximize
ove">
Primal Problem If x is primal infeasible: g i (x) >0 for
some i maximize over i >0 then i g i (x) is unbounded h i (x) 0
for some i maximize over then i h i (x) is unbounded x is primal
infeasible if g i (x) < 0 for some i or h i (x) 0 for some i
Primal Problem
Slide 31
If x is primal feasible: g i (x) 0 for all i maximize over i 0
then optimal is i =0 h i (x) = 0 for all i maximize over then i h i
(x) = 0, is irrelevant
Slide 32
Primal Problem Made constraints into value in optimization
function Which is what we are looking for!!! is an optimal x
Slide 33
Dual Problem , are dual feasible if i 0 for all i This
implies
Slide 34
Weak and Strong Duality Question: When are they equal?
Slide 35
Strong Duality: Slaters Condition If f,g i are convex and h i
is affine and the problem is strictly feasible e.g. exist primal
feasible x such g i (x) < 0 for all i then d* = p * (strong
duality) Assume that is the case
Slide 36
Complementary Slackness Let x* be primal optimal *,* dual
optimal (p*=d*) All Non-Negative for all i Complimentary
Slackness
Slide 37
Karush-Kuhn-Tucker (KKT) Conditions Let x* be primal optimal
*,* dual optimal (p*=d*) g i (x*) 0, for all i i * 0 for all i i *
g i (x*) = 0 for all i h i (x*) = 0 for all i Primal Feasibility
Dual Feasibility Complementary Slackness Since x* minimizes
Stationary KKT Conditions for optimality, necessary and
sufficient.
Slide 38
Finally Back To SVM Subject To Minimize Define the Lagrangian
(no required)
Slide 39
SVM Dual Form Need to minimize. We take derivatives and solve
for 0 Solve for 0 w is a vector that is a specific linear
combination of input points
Slide 40
SVM Dual Form Which must be 0. We get constraint
Slide 41
SVM Dual Form Insert Above
Slide 42
SVM Dual Form Insert Above
Slide 43
SVM Dual Form
Slide 44
SVM Dual Problem Found the minimum over w,b now maximize over
Subject To Remember
Slide 45
Intercerpt b* Case: y i = 1 Cases: y i =-1 Constraint
Slide 46
Making Predictions Sign of Support Vectors
Slide 47
w Complementary Slackness Support vectors are the vectors that
support the plane