FA18 cs188 lecture21 perceptron and logistic regression · 2021. 1. 8. · Linear Classifiers § Inputs are feature values § Each feature has a weight § Sum is the activation §

LinearClassifiers

FeatureVectors

Hello,

Do you want free printr cartriges? Why pay more when you can get them ABSOLUTELY FREE! Just

# free : 2YOUR_NAME : 0MISSPELLED : 2FROM_FRIEND : 0...

SPAMor+

PIXEL-7,12 : 1PIXEL-7,13 : 0...NUM_LOOPS : 1...

“2”

Some(Simplified)Biology

§ Verylooseinspiration:humanneurons

LinearClassifiers

§ Inputsarefeaturevalues§ Eachfeaturehasaweight§ Sumistheactivation

§ Iftheactivationis:§ Positive,output+1§ Negative,output-1

Sf1f2f3

w1w2w3

>0?

Weights§ Binarycase:comparefeaturestoaweightvector§ Learning:figureouttheweightvectorfromexamples


# free : 4YOUR_NAME :-1MISSPELLED : 1FROM_FRIEND :-3...


Dot product positive means the positive class

DecisionRules

BinaryDecisionRule

§ Inthespaceoffeaturevectors§ Examplesarepoints§ Anyweightvectorisahyperplane§ OnesidecorrespondstoY=+1§ OthercorrespondstoY=-1

BIAS : -3free : 4money : 2... 0 1

0

1

2

freem

oney

+1 = SPAM

-1 = HAM

WeightUpdates

Learning:BinaryPerceptron

§ Startwithweights=0§ Foreachtraininginstance:

§ Classifywithcurrentweights

§ Ifcorrect(i.e.,y=y*),nochange!

§ Ifwrong:adjusttheweightvector

Learning:BinaryPerceptron

§ Startwithweights=0§ Foreachtraininginstance:

§ Classifywithcurrentweights

§ Ifcorrect(i.e.,y=y*),nochange!§ Ifwrong:adjusttheweightvectorbyaddingorsubtractingthefeaturevector.Subtractify*is-1.

Examples:Perceptron

§ SeparableCase

MulticlassDecisionRule

§ Ifwehavemultipleclasses:§ Aweightvectorforeachclass:

§ Score(activation)ofaclassy:

§ Predictionhighestscorewins

Binary = multiclass where the negative class has weight zero

Learning:MulticlassPerceptron

§ Startwithallweights=0§ Pickuptrainingexamplesonebyone§ Predictwithcurrentweights

§ Ifcorrect,nochange!§ Ifwrong:lowerscoreofwronganswer,

raisescoreofrightanswer

Example:MulticlassPerceptron

BIAS : 1win : 0game : 0 vote : 0 the : 0 ...

BIAS : 0 win : 0 game : 0 vote : 0 the : 0 ...

BIAS : 0 win : 0 game : 0 vote : 0 the : 0 ...

“win the vote”

“win the election”

“win the game”

PropertiesofPerceptrons

§ Separability:trueifsomeparametersgetthetrainingsetperfectlycorrect

§ Convergence:ifthetrainingisseparable,perceptron willeventuallyconverge(binarycase)

§ MistakeBound:themaximumnumberofmistakes(binarycase)relatedtothemargin ordegreeofseparability

Separable

Non-Separable

ProblemswiththePerceptron

§ Noise:ifthedataisn’tseparable,weightsmightthrash§ Averagingweightvectorsovertime

canhelp(averagedperceptron)

§ Mediocregeneralization:findsa“barely”separatingsolution

§ Overtraining:test/held-outaccuracyusuallyrises,thenfalls§ Overtrainingisakindofoverfitting

ImprovingthePerceptron

Non-SeparableCase:DeterministicDecisionEventhebestlinearboundarymakesatleastonemistake

Non-SeparableCase:ProbabilisticDecision

0.5 |0.50.3|0.7

0.1|0.9

0.7|0.30.9|0.1

Howtogetprobabilisticdecisions?

§ Perceptronscoring:§ If verypositiveà wantprobabilitygoingto1§ If verynegativeà wantprobabilitygoingto0

§ Sigmoidfunction

z = w · f(x)z = w · f(x)

z = w · f(x)

�(z) =1

1 + e�z

Bestw?

§ Maximumlikelihoodestimation:

with:

maxw

ll(w) = maxw

X

i

logP (y(i)|x(i);w)

P (y(i) = +1|x(i);w) = 11 + e�w·f(x(i))

P (y(i) = �1|x(i);w) = 1� 11 + e�w·f(x(i))

=LogisticRegression

SeparableCase:DeterministicDecision– ManyOptions

SeparableCase:ProbabilisticDecision– ClearPreference

0.5 |0.50.3 |0.7

0.7 |0.3

0.5 |0.50.3 |0.7

0.7 |0.3

MulticlassLogisticRegression

§ RecallPerceptron:§ Aweightvectorforeachclass:

§ Score(activation)ofaclassy:

§ Predictionhighestscorewins

§ Howtomakethescoresintoprobabilities?

z1, z2, z3 !ez1

ez1 + ez2 + ez3,

ez2

ez1 + ez2 + ez3,

ez3

ez1 + ez2 + ez3

original activations softmax activations

Bestw?

§ Maximumlikelihoodestimation:

with:

maxw

ll(w) = maxw

X

i

logP (y(i)|x(i);w)

P (y(i)|x(i);w) = ew

y(i)·f(x(i))

Py e

wy·f(x(i))

=Multi-ClassLogisticRegression

NextLecture

§ Optimization

§ i.e.,howdowesolve:

maxw

ll(w) = maxw

X

i

logP (y(i)|x(i);w)

Documents

FA18 cs188 lecture21 perceptron and logistic regression · 2021. 1. 8. · Linear Classifiers § Inputs are feature values § Each feature has a weight § Sum is the activation §