Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
LinearClassifiers
FeatureVectors
Hello,
Do you want free printr cartriges? Why pay more when you can get them ABSOLUTELY FREE! Just
# free : 2YOUR_NAME : 0MISSPELLED : 2FROM_FRIEND : 0...
SPAMor+
PIXEL-7,12 : 1PIXEL-7,13 : 0...NUM_LOOPS : 1...
“2”
Some(Simplified)Biology
§ Verylooseinspiration:humanneurons
LinearClassifiers
§ Inputsarefeaturevalues§ Eachfeaturehasaweight§ Sumistheactivation
§ Iftheactivationis:§ Positive,output+1§ Negative,output-1
Sf1f2f3
w1w2w3
>0?
Weights§ Binarycase:comparefeaturestoaweightvector§ Learning:figureouttheweightvectorfromexamples
# free : 2YOUR_NAME : 0MISSPELLED : 2FROM_FRIEND : 0...
# free : 4YOUR_NAME :-1MISSPELLED : 1FROM_FRIEND :-3...
# free : 0YOUR_NAME : 1MISSPELLED : 1FROM_FRIEND : 1...
Dot product positive means the positive class
DecisionRules
BinaryDecisionRule
§ Inthespaceoffeaturevectors§ Examplesarepoints§ Anyweightvectorisahyperplane§ OnesidecorrespondstoY=+1§ OthercorrespondstoY=-1
BIAS : -3free : 4money : 2... 0 1
0
1
2
freem
oney
+1 = SPAM
-1 = HAM
WeightUpdates
Learning:BinaryPerceptron
§ Startwithweights=0§ Foreachtraininginstance:
§ Classifywithcurrentweights
§ Ifcorrect(i.e.,y=y*),nochange!
§ Ifwrong:adjusttheweightvector
Learning:BinaryPerceptron
§ Startwithweights=0§ Foreachtraininginstance:
§ Classifywithcurrentweights
§ Ifcorrect(i.e.,y=y*),nochange!§ Ifwrong:adjusttheweightvectorbyaddingorsubtractingthefeaturevector.Subtractify*is-1.
Examples:Perceptron
§ SeparableCase
MulticlassDecisionRule
§ Ifwehavemultipleclasses:§ Aweightvectorforeachclass:
§ Score(activation)ofaclassy:
§ Predictionhighestscorewins
Binary = multiclass where the negative class has weight zero
Learning:MulticlassPerceptron
§ Startwithallweights=0§ Pickuptrainingexamplesonebyone§ Predictwithcurrentweights
§ Ifcorrect,nochange!§ Ifwrong:lowerscoreofwronganswer,
raisescoreofrightanswer
Example:MulticlassPerceptron
BIAS : 1win : 0game : 0 vote : 0 the : 0 ...
BIAS : 0 win : 0 game : 0 vote : 0 the : 0 ...
BIAS : 0 win : 0 game : 0 vote : 0 the : 0 ...
“win the vote”
“win the election”
“win the game”
PropertiesofPerceptrons
§ Separability:trueifsomeparametersgetthetrainingsetperfectlycorrect
§ Convergence:ifthetrainingisseparable,perceptron willeventuallyconverge(binarycase)
§ MistakeBound:themaximumnumberofmistakes(binarycase)relatedtothemargin ordegreeofseparability
Separable
Non-Separable
ProblemswiththePerceptron
§ Noise:ifthedataisn’tseparable,weightsmightthrash§ Averagingweightvectorsovertime
canhelp(averagedperceptron)
§ Mediocregeneralization:findsa“barely”separatingsolution
§ Overtraining:test/held-outaccuracyusuallyrises,thenfalls§ Overtrainingisakindofoverfitting
ImprovingthePerceptron
Non-SeparableCase:DeterministicDecisionEventhebestlinearboundarymakesatleastonemistake
Non-SeparableCase:ProbabilisticDecision
0.5 |0.50.3|0.7
0.1|0.9
0.7|0.30.9|0.1
Howtogetprobabilisticdecisions?
§ Perceptronscoring:§ If verypositiveà wantprobabilitygoingto1§ If verynegativeà wantprobabilitygoingto0
§ Sigmoidfunction
z = w · f(x)z = w · f(x)
z = w · f(x)
�(z) =1
1 + e�z
Bestw?
§ Maximumlikelihoodestimation:
with:
maxw
ll(w) = maxw
X
i
logP (y(i)|x(i);w)
P (y(i) = +1|x(i);w) = 11 + e�w·f(x(i))
P (y(i) = �1|x(i);w) = 1� 11 + e�w·f(x(i))
=LogisticRegression
SeparableCase:DeterministicDecision– ManyOptions
SeparableCase:ProbabilisticDecision– ClearPreference
0.5 |0.50.3 |0.7
0.7 |0.3
0.5 |0.50.3 |0.7
0.7 |0.3
MulticlassLogisticRegression
§ RecallPerceptron:§ Aweightvectorforeachclass:
§ Score(activation)ofaclassy:
§ Predictionhighestscorewins
§ Howtomakethescoresintoprobabilities?
z1, z2, z3 !ez1
ez1 + ez2 + ez3,
ez2
ez1 + ez2 + ez3,
ez3
ez1 + ez2 + ez3
original activations softmax activations
Bestw?
§ Maximumlikelihoodestimation:
with:
maxw
ll(w) = maxw
X
i
logP (y(i)|x(i);w)
P (y(i)|x(i);w) = ew
y(i)·f(x(i))
Py e
wy·f(x(i))
=Multi-ClassLogisticRegression
NextLecture
§ Optimization
§ i.e.,howdowesolve:
maxw
ll(w) = maxw
X
i
logP (y(i)|x(i);w)