Machine Learning Week 4 Lecture 1. Hand In Data Is coming online later today. I keep test set with approx. 1000 test images That will be your real test

Embed Size (px)

Citation preview

  • Slide 1
  • Machine Learning Week 4 Lecture 1
  • Slide 2
  • Hand In Data Is coming online later today. I keep test set with approx. 1000 test images That will be your real test You are most welcome to add regularization as we discussed last week. It is not a requirement. Hand in Version 4 available
  • Slide 3
  • Recap What is going on Ways to fix it
  • Slide 4
  • Overfitting Data Increases -> Overfitting Decreases Noise Increases -> Overfitting Increases Target Complexity Increase -> Overfitting Increases
  • Slide 5
  • Learning Theory Perspective In Sample Error + Model Complexity Instead of picking simpler hypothesis set Prefer Simpler hypotheses h from Define what simple means in complexity measure Minimize
  • Slide 6
  • Regularization In Sample Error + Model Complexity Weight Decay Decay Every round we take a step towards the zero vector
  • Slide 7
  • Why are small weights better Practical Perspective Because in practice we believe that Noise is Noisy Stochastic Noise High Frequency Deterministic Noise also non-smooth Sometimes weight are weighed differently Bias Term gets free ride
  • Slide 8
  • Regularization Summary More Art Than Science Use VC and Bias Variance as guides Weight Decay universal technique practical believe that noise is noisy (non-smooth) Question. Which to use Many other regularizers exist. Extremely Important. Quote Book: Necessary Evil
  • Slide 9
  • Validation Regularization Estimates Validation Estimate Remember the test set
  • Slide 10
  • Model Selection t Models m 1,,m t Which is better? E val (m 1 ) E val (m 2 ). E val (m t ) Pick the minimum one Compute Train on D train Validate on D val Use to find for my weight decay
  • Slide 11
  • Cross Validation Increasing K Dilemma E val estimate tightens E val increases Small K Large K We would like to have both. Cross Validation
  • Slide 12
  • K-Fold Cross Validation Split Data in N/K Parts of size K Test Train all but one set. Test on remaining. Pick one who is best on average over N/K partitions Usual K = N/10 (we do not have all day)
  • Slide 13
  • Today: Support Vector Machines Margins Intuition Optimization Problem Convex Optimization Lagrange Multipliers Lagrange for SVM WARNING: Linear Algebra and function analysis coming up
  • Slide 14
  • Support Vector Machines Today Next Time
  • Slide 15
  • Notation Target y is in {-1,+1} We write parameters as w and b The hyperplane we consider is w T x + b = 0 Data D = {x i,y i ) For now assume D is linear separable 0 for some i maximize over i >0 then i g i (x) is unbounded h i (x) 0 for some i maximize ove">
  • Primal Problem If x is primal infeasible: g i (x) >0 for some i maximize over i >0 then i g i (x) is unbounded h i (x) 0 for some i maximize over then i h i (x) is unbounded x is primal infeasible if g i (x) < 0 for some i or h i (x) 0 for some i Primal Problem
  • Slide 31
  • If x is primal feasible: g i (x) 0 for all i maximize over i 0 then optimal is i =0 h i (x) = 0 for all i maximize over then i h i (x) = 0, is irrelevant
  • Slide 32
  • Primal Problem Made constraints into value in optimization function Which is what we are looking for!!! is an optimal x
  • Slide 33
  • Dual Problem , are dual feasible if i 0 for all i This implies
  • Slide 34
  • Weak and Strong Duality Question: When are they equal?
  • Slide 35
  • Strong Duality: Slaters Condition If f,g i are convex and h i is affine and the problem is strictly feasible e.g. exist primal feasible x such g i (x) < 0 for all i then d* = p * (strong duality) Assume that is the case
  • Slide 36
  • Complementary Slackness Let x* be primal optimal *,* dual optimal (p*=d*) All Non-Negative for all i Complimentary Slackness
  • Slide 37
  • Karush-Kuhn-Tucker (KKT) Conditions Let x* be primal optimal *,* dual optimal (p*=d*) g i (x*) 0, for all i i * 0 for all i i * g i (x*) = 0 for all i h i (x*) = 0 for all i Primal Feasibility Dual Feasibility Complementary Slackness Since x* minimizes Stationary KKT Conditions for optimality, necessary and sufficient.
  • Slide 38
  • Finally Back To SVM Subject To Minimize Define the Lagrangian (no required)
  • Slide 39
  • SVM Dual Form Need to minimize. We take derivatives and solve for 0 Solve for 0 w is a vector that is a specific linear combination of input points
  • Slide 40
  • SVM Dual Form Which must be 0. We get constraint
  • Slide 41
  • SVM Dual Form Insert Above
  • Slide 42
  • SVM Dual Form Insert Above
  • Slide 43
  • SVM Dual Form
  • Slide 44
  • SVM Dual Problem Found the minimum over w,b now maximize over Subject To Remember
  • Slide 45
  • Intercerpt b* Case: y i = 1 Cases: y i =-1 Constraint
  • Slide 46
  • Making Predictions Sign of Support Vectors
  • Slide 47
  • w Complementary Slackness Support vectors are the vectors that support the plane
  • Slide 48
  • SVM Summary Subject To Support Vectors w