Announcements - Homeworkaarti/Class/10701/slides/Lecture11.pdfAnnouncements - Homework •Homework 1...

Announcements - Homework

• Homework 1 is graded, please collect at end of lecture

• Homework 2 due today

• Homework 3 out soon (watch email)

• Ques 1 – midterm review

HW1 score distribution

0~10 10~20 20~30 30~40 40~50 50~60 60~70 70~80 80~90 90~100 100~110

HW1 total score

Announcements - Midterm

• When: Wednesday, 10/20

• Where: In Class

• What: You, your pencil, your textbook, your notes, course slides, your calculator, your good mood :)

• What NOT: No computers, iphones, or anything else that has an internet connection.

• Material: Everything from the beginning of the semester, until, and including SVMs and the Kernel trick

Recitation Tomorrow!

• Boosting, SVM (convex optimization),

Midterm review!

• Strongly recommended!!

• Place: NSH 3305 (Note: change from last time)

• Time: 5-6 pm

Support Vector Machines

Aarti Singh

Machine Learning 10-701/15-781Oct 13, 2010

At Pittsburgh G-20 summit …

Linear classifiers – which line is better?

Pick the one with the largest margin!

Parameterizing the decision boundary

w.x + b > 0 w.x + b < 0

Data:Example i (= 1,2,…,n):

w.x = j w(j) x(j)

Parameterizing the decision boundary

w.x + b > 0 w.x + b < 0

Maximizing the margin

margin = g = 2a/ǁwǁ

w.x + b > 0 w.x + b < 0

Distance of closest examples from the line/hyperplane

Maximizing the margin

w.x + b > 0 w.x + b < 0

g gmax g = 2a/ǁwǁ

s.t. (w.xj+b) yj ≥ a j

margin = g = 2a/ǁwǁ

Note: ‘a’ is arbitrary (can normalize equations by a)

Distance of closest examples from the line/hyperplane

Support Vector Machines

w.x + b > 0 w.x + b < 0

min w.ww,b

s.t. (w.xj+b) yj ≥ 1 j

Solve efficiently by quadratic programming (QP)

– Well-studied solution algorithms

Linear hyperplane defined by “support vectors”

Support Vectors

w.x + b > 0 w.x + b < 0

Linear hyperplane defined by “support vectors”

Moving other points a little doesn’t effect the decision boundary

only need to store the support vectors to predict labels of new points

How many support vectors in linearly separable case?

≤ m+1

What if data is not linearly separable?

Use features of features of features of features….

But run risk of overfitting!

x12, x2

2, x1x2, …., exp(x1)

What if data is still not linearly separable?

min w.w + C #mistakes w,b

s.t. (w.xj+b) yj ≥ 1 j

Allow “error” in classification

Maximize margin and minimize # mistakes on training data

C - tradeoff parameter

Not QP

0/1 loss (doesn’t distinguish between

near miss and bad mistake)

What if data is still not linearly separable?

min w.w + C Σξjw,b

s.t. (w.xj+b) yj ≥ 1-ξj j

ξj ≥ 0 j

Allow “error” in classification

ξj - “slack” variables = (>1 if xj misclassifed)

pay linear penalty if mistake

C - tradeoff parameter (chosen by

cross-validation)

Still QP

Soft margin approach

Slack variables – Hinge loss

ξj ≥ 0 j

Complexity penalization

Hinge loss

0-1 loss

SVM vs. Logistic Regression

SVM : Hinge loss

0-1 loss

Logistic Regression : Log loss ( -ve log conditional likelihood)

Hinge lossLog loss

What about multiple classes?

One against all

Learn 3 classifiers separately: Class k vs. rest

(wk, bk)k=1,2,3

y = arg max wk.x + bkk

But wks may not be based on the same scale.Note: (aw).x + (ab) is also a solution

Learn 1 classifier: Multi-class SVM

Simultaneously learn 3 sets of weights

y = arg max w(k).x + b(k)

Margin - gap between correct class and nearest other class

Learn 1 classifier: Multi-class SVM

Simultaneously learn 3 sets of weights

y = arg max w(k).x + b(k)

Joint optimization: wks have the same scale.

What you need to know

• Maximizing margin

• Derivation of SVM formulation

• Slack variables and hinge loss

• Relationship between SVMs and logistic regression– 0/1 loss

– Hinge loss

– Log loss

• Tackling multiple class– One against All

– Multiclass SVMs

SVMs reminder

ξj ≥ 0 j

Hinge lossRegularization

Soft margin approach

Today’s Lecture

• Learn one of the most interesting and exciting recent advancements in machine learning

– The “kernel trick”

– High dimensional feature spaces at no extra cost!

• But first, a detour

– Constrained optimization!

Constrained Optimization

Lagrange Multiplier – Dual Variables

Moving the constraint to objective functionLagrangian:

Solve:

Constraint is tight when a > 0

Duality

Primal problem: Dual problem:

Weak duality –

For all feasible points

Strong duality – (holds under KKT conditions)

Lagrange Multiplier – Dual Variables

Solving:b -ve b +ve

When a > 0, constraint is tight

Announcements - Homeworkaarti/Class/10701/slides/Lecture11.pdfAnnouncements - Homework •Homework 1...

Documents

R 10701 · R 10701. San Luis Obispo City Council, March 15, 2016 Support for National Revenue Neutral Carbon Fee and Dividend Legislation Page 4 of 5 14 Gleick, Peter H. Water --

Homework 5 solutions Homework 6 solutions Homework 10

Transductive SVMsguestrin/Class/10701-S06/Slides/tsvms...1 Transductive SVMs Machine Learning – 10701/15781 Carlos Guestrin Carnegie Mellon University April 17 th, 2006 Reading:

Homework Homework Homework Mandarin - Ying Li PE - ADRIAN

Expectation Maximizationguestrin/Class/10701-S07/Slides/em... · ©2005-2007 Carlos Guestrin Expectation Maximization Machine Learning – 10701/15781 Carlos Guestrin Carnegie Mellon

29 Wells Ave, Yonkers, NY 10701 - Skil-Care Corporationskil-care.com/dashboard/wp-content/uploads/2015/09/MattressMailer... · 29 Wells Ave, Yonkers, NY 10701 ... Desi g n 18”W

10701+Introduction+to+Machine+Learning+mgormley/courses/10701-f16/slides/lectur… · Eric Xing Example of Supervised Learning ! Predict the price of a stock in 6 months from now,

Columbine Report Pgs 10701-10800

CSC290 Communication Skills for Computer Scientistslczhang/290/lec/lec05.pdfAnnouncements Today: I PresentationSkills I DesignReviewPresentation(handoutispostedonthecourse website)

IS 10701 (2012): Specification for Structural Plywood ...IS 10701 : 2012 11.4.1 The average failing load of a set of six test specimens prepared from each pair of glue lines and tested

PAC-learning, VC Dimensionguestrin/Class/10701/slides/learningtheory-big... · PAC-learning, VC Dimension Machine Learning – 10701/15781 Carlos Guestrin Carnegie Mellon University

Oral Health Program 17-10701 Agreement Packet 11-27-17

10701 - 10789 Redwood Avenue • Fontana...10701-10789 Redwood Avenue • Fontana ±40,040 SF Crossdock on ±9.19 Acres of Land Visit: For more information, please contact: Steve Sprenger

10701 Holder Street, Cypress, CA...10701 Holder Street, Cypress, CA MIKE BOUMA, SIOR Senior Vice President 714.935.2340 mbouma@voitco.com Lic. #01070753 2400 E. Katella Ave., Suite

Final Exam, 10701 Machine Learning, Spring 2009epxing/Class/10701/exams/09s-701-final.pdf · Final Exam, 10701 Machine Learning, Spring 2009 - The exam is open-book, open-notes, no

ECE 574 { Cluster Computing Lecture 4web.eece.maine.edu/.../classes/ece574_2019s/ece574_lec04.pdfAnnouncements I’ll post Homework #2, will send e-mail, submit via e-mail Barrows

10701 Recitation: Model Selection and Decision Trees · 10701 Recitation: Decision Trees & Model Selection (AIC & BIC) Min Chi Oct 7. th, 2010 NSH 1507

Homework 1 Probability, MLE, MAP, KNN and Naive Bayesepxing/Class/10701-20/hw/F20...1 Probability Review [10pts] A group of travellers nd themselves lost in a cave. They come upon

LIFEGUARD: Practical Repair of Persistent Route Failuresconferences.sigcomm.org/sigcomm/2012/paper/sigcomm/p395.pdfannouncements reach A, BGP’s loop-prevention mechanisms will drop

10701 Recitation: Model Selection and Decision Treesaarti/Class/10701/recitation/decisiontree... · • Build a decision tree (≥ 2 level) Step by Step. • Building a decision tree