27
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, 2014 [email protected] http://www.cmpe.boun.edu.tr/~ethem/i2ml3e Lecture Slides for

Introduction to Machine Learningethem/i2ml3e/3e_v1-0/i2ml3e-chap10.pdfLearning to Rank 25 Ranking: A different problem than classification or regression Let us say xu vand x are two

  • Upload
    others

  • View
    17

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Introduction to Machine Learningethem/i2ml3e/3e_v1-0/i2ml3e-chap10.pdfLearning to Rank 25 Ranking: A different problem than classification or regression Let us say xu vand x are two

INTRODUCTION

TO

MACHINE

LEARNING 3RD EDITION

ETHEM ALPAYDIN

© The MIT Press, 2014

[email protected]

http://www.cmpe.boun.edu.tr/~ethem/i2ml3e

Lecture Slides for

Page 2: Introduction to Machine Learningethem/i2ml3e/3e_v1-0/i2ml3e-chap10.pdfLearning to Rank 25 Ranking: A different problem than classification or regression Let us say xu vand x are two

CHAPTER 10:

LINEAR DISCRIMINATION

Page 3: Introduction to Machine Learningethem/i2ml3e/3e_v1-0/i2ml3e-chap10.pdfLearning to Rank 25 Ranking: A different problem than classification or regression Let us say xu vand x are two

Likelihood- vs. Discriminant-based

Classification 3

Likelihood-based: Assume a model for p(x|Ci), use

Bayes’ rule to calculate P(Ci|x)

gi(x) = log P(Ci|x)

Discriminant-based: Assume a model for gi(x|Φi);

no density estimation

Estimating the boundaries is enough; no need to

accurately estimate the densities inside the

boundaries

Page 4: Introduction to Machine Learningethem/i2ml3e/3e_v1-0/i2ml3e-chap10.pdfLearning to Rank 25 Ranking: A different problem than classification or regression Let us say xu vand x are two

Linear discriminant:

Advantages:

Simple: O(d) space/computation

Knowledge extraction: Weighted sum of attributes;

positive/negative weights, magnitudes (credit scoring)

Optimal when p(x|Ci) are Gaussian with shared cov matrix;

useful when classes are (almost) linearly separable

Linear Discriminant 4

0

1

00 ij

d

jiji

Tiiii wxwwwg

xwwx ,|

Page 5: Introduction to Machine Learningethem/i2ml3e/3e_v1-0/i2ml3e-chap10.pdfLearning to Rank 25 Ranking: A different problem than classification or regression Let us say xu vand x are two

Quadratic discriminant:

Higher-order (product) terms:

Map from x to z using nonlinear basis functions and use a linear

discriminant in z-space

Generalized Linear Model 5

215

2

24

2

132211 xxzxzxzxzxz , , , ,

00 iTii

Tiiii wwg xwxxwx WW ,,|

k

jjiji wg

1

xx

Page 6: Introduction to Machine Learningethem/i2ml3e/3e_v1-0/i2ml3e-chap10.pdfLearning to Rank 25 Ranking: A different problem than classification or regression Let us say xu vand x are two

Two Classes

0

201021

202101

21

w

ww

ww

ggg

T

T

TT

xw

xww

xwxw

xxx

otherwise

if choose

2

1 0

C

gC x

6

Page 7: Introduction to Machine Learningethem/i2ml3e/3e_v1-0/i2ml3e-chap10.pdfLearning to Rank 25 Ranking: A different problem than classification or regression Let us say xu vand x are two

Geometry 7

Page 8: Introduction to Machine Learningethem/i2ml3e/3e_v1-0/i2ml3e-chap10.pdfLearning to Rank 25 Ranking: A different problem than classification or regression Let us say xu vand x are two

Multiple Classes

00 iTiiii wwg xwwx ,|

xx j

K

ji

i

gg

C

1max

if Choose

8

Classes are

linearly separable

Page 9: Introduction to Machine Learningethem/i2ml3e/3e_v1-0/i2ml3e-chap10.pdfLearning to Rank 25 Ranking: A different problem than classification or regression Let us say xu vand x are two

Pairwise Separation

00 ijTijijijij wwg xwwx ,|

otherwisecare tdon'

if

if

j

i

ij C

C

g x

x

x 0

0

0 xij

i

gij

C

,

i f choose

9

Page 10: Introduction to Machine Learningethem/i2ml3e/3e_v1-0/i2ml3e-chap10.pdfLearning to Rank 25 Ranking: A different problem than classification or regression Let us say xu vand x are two

From Discriminants to Posteriors 10

iiTiiii

iTiiii

CPw

wwg

log

,|

μμ2

1μ 1

0

1

00

w

xwwx

When p (x | Ci ) ~ N ( μi , ∑)

otherwise and

log

if choose

| and |

21

21

01

11

50

1

C

yy

yy

y

C

yCPCPy

/

/

.

xx

Page 11: Introduction to Machine Learningethem/i2ml3e/3e_v1-0/i2ml3e-chap10.pdfLearning to Rank 25 Ranking: A different problem than classification or regression Let us say xu vand x are two

11

0

01

0

1

1

21

1

21021

1

0

2

1

2

1

2

212

1

1

1

212

2

1

2

1

2

1

1

11

1

1

1

μμμμ2

1μμ

μμ212

μμ212

1

wwCP

wCP

CP

w

w

CP

CP

CP

CP

Cp

Cp

CP

CP

CP

CPCP

T

T

T

T

T

Td

Td

xwxwx

xwx

x

w

xw

xx

xx

x

x

x

x

x

xx

expsigmoid|

|

|log

logit of inverse The

where

logexp

explog

log|

|log

|

|log

|

|log|logit

/

///

//

Page 12: Introduction to Machine Learningethem/i2ml3e/3e_v1-0/i2ml3e-chap10.pdfLearning to Rank 25 Ranking: A different problem than classification or regression Let us say xu vand x are two

Sigmoid (Logistic) Function 12

50

0

10

10

.

yCwy

gCwgT

T

if choose and sigmoid Calculate

or , if choose and Calculate

xw

xxwx

Page 13: Introduction to Machine Learningethem/i2ml3e/3e_v1-0/i2ml3e-chap10.pdfLearning to Rank 25 Ranking: A different problem than classification or regression Let us say xu vand x are two

Gradient-Descent 13

T

d

ww

E

w

E

w

EE

,...,,

21

E(w|X) is error with parameters w on sample X

w*=arg minw E(w | X)

Gradient

Gradient-descent: Starts from random w and updates w iteratively in the

negative direction of gradient

Page 14: Introduction to Machine Learningethem/i2ml3e/3e_v1-0/i2ml3e-chap10.pdfLearning to Rank 25 Ranking: A different problem than classification or regression Let us say xu vand x are two

Gradient-Descent 14

iii

i

i

www

iw

Ew

,

wt wt+1

η

E (wt)

E (wt+1)

Page 15: Introduction to Machine Learningethem/i2ml3e/3e_v1-0/i2ml3e-chap10.pdfLearning to Rank 25 Ranking: A different problem than classification or regression Let us say xu vand x are two

Logistic Discrimination 15

0

1

2

100

0

2

1

2

1

1

11

0

2

1

1

1

1

wCPy

CP

CPww

w

CP

CP

Cp

Cp

CP

CPCP

wCp

Cp

T

o

T

oT

xwx

xw

x

x

x

xx

xwx

x

exp|

log where

log|

|log

|

|log|logit

|

|log

ˆ

Two classes: Assume log likelihood ratio is linear

Page 16: Introduction to Machine Learningethem/i2ml3e/3e_v1-0/i2ml3e-chap10.pdfLearning to Rank 25 Ranking: A different problem than classification or regression Let us say xu vand x are two

Training: Two Classes 16

ttt

t

t

rt

t

rt

T

tttt

tt

yryrwE

lE

yywl

wCPy

yrr

tt

11

1

1

1

0

1

0

0

1

log log|

log

|

exp|

Bernoulli|

X

X

X

,

,

~,

w

w

xwx

xx

Page 17: Introduction to Machine Learningethem/i2ml3e/3e_v1-0/i2ml3e-chap10.pdfLearning to Rank 25 Ranking: A different problem than classification or regression Let us say xu vand x are two

Training: Gradient-Descent 17

t

tt

t

tj

tt

tj

tt

tt

t

t

t

j

j

ttt

t

t

yrw

Ew

djxyr

xyyy

r

y

r

w

Ew

yyda

dyy

yryrwE

0

0

0

1

11

1

1

11

,...,,

,

asigmoid If

log log|Xw

Page 18: Introduction to Machine Learningethem/i2ml3e/3e_v1-0/i2ml3e-chap10.pdfLearning to Rank 25 Ranking: A different problem than classification or regression Let us say xu vand x are two

18

Page 19: Introduction to Machine Learningethem/i2ml3e/3e_v1-0/i2ml3e-chap10.pdfLearning to Rank 25 Ranking: A different problem than classification or regression Let us say xu vand x are two

19

10

100 1000

Page 20: Introduction to Machine Learningethem/i2ml3e/3e_v1-0/i2ml3e-chap10.pdfLearning to Rank 25 Ranking: A different problem than classification or regression Let us say xu vand x are two

K>2 Classes 20

t

tj

tjj

t

t

tj

tjj

ti

t

tiiii

t i

rtiiii

K

j jTj

iTi

i

oi

Ti

K

i

tK

ttt

tt

yrwyr

yrwE

ywl

Kiw

wCPy

wCp

Cp

r

ti

0

0

0

1 0

0

0

1

1

log|,

|,

,...,,exp

exp|ˆ

|

|log

,Mult~| ,

xw

w

w

xw

xwx

xwx

x

yxrx

X

X

X

softmax

Page 21: Introduction to Machine Learningethem/i2ml3e/3e_v1-0/i2ml3e-chap10.pdfLearning to Rank 25 Ranking: A different problem than classification or regression Let us say xu vand x are two

21

Page 22: Introduction to Machine Learningethem/i2ml3e/3e_v1-0/i2ml3e-chap10.pdfLearning to Rank 25 Ranking: A different problem than classification or regression Let us say xu vand x are two

Example 22

Page 23: Introduction to Machine Learningethem/i2ml3e/3e_v1-0/i2ml3e-chap10.pdfLearning to Rank 25 Ranking: A different problem than classification or regression Let us say xu vand x are two

Quadratic:

Sum of basis functions:

where φ(x) are basis functions. Examples:

Hidden units in neural networks (Chapters 11 and 12)

Kernels in SVM (Chapter 13)

Generalizing the Linear Model 23

0i

Tii

T

K

i wCp

Cp xwxx

x

xW

|

|log

0iTi

K

i wCp

Cp xw

x

x

|

|log

Page 24: Introduction to Machine Learningethem/i2ml3e/3e_v1-0/i2ml3e-chap10.pdfLearning to Rank 25 Ranking: A different problem than classification or regression Let us say xu vand x are two

Discrimination by Regression 24

ttt

t

tt

t

tt

tt

t

tT

tTt

tt

yyyr

yrwE

yrwl

wwy

yr

xw

w

w

xwxw

1

2

1

22

1

1

1

0

2

0

2

2

0

0

0

2

X

X

N

|

exp|

expsigmoid

where

,

,

,~

Classes are NOT mutually exclusive and exhaustive

Page 25: Introduction to Machine Learningethem/i2ml3e/3e_v1-0/i2ml3e-chap10.pdfLearning to Rank 25 Ranking: A different problem than classification or regression Let us say xu vand x are two

Learning to Rank 25

Ranking: A different problem than classification or

regression

Let us say xu and xv are two instances, e.g., two

movies

We prefer u to v implies that g(xu)>g(xv)

where g(x) is a score function, here linear:

g(x)=wTx

Find a direction w such that we get the desired

ranks when instances are projected along w

Page 26: Introduction to Machine Learningethem/i2ml3e/3e_v1-0/i2ml3e-chap10.pdfLearning to Rank 25 Ranking: A different problem than classification or regression Let us say xu vand x are two

Ranking Error 26

We prefer u to v implies that g(xu)>g(xv), so

error is g(xv)-g(xu), if g(xu)<g(xv)

Page 27: Introduction to Machine Learningethem/i2ml3e/3e_v1-0/i2ml3e-chap10.pdfLearning to Rank 25 Ranking: A different problem than classification or regression Let us say xu vand x are two

27