Supervised Classifiers: Naïve Bayes, Perceptron, Logistic ... · • Standard supervised learning...

Supervised Classifiers:

Naïve Bayes, Perceptron,

Logistic Regression

CMSC 723 / LING 723 / INST 725

MARINE CARPUAT

marine@cs.umd.edu Some slides by Graham Neubig , Jacob Eisenstein

• 3 linear classifiers

– Naïve Bayes

– Perceptron

– Logistic Regression

• Bag of words vs. rich feature sets

• Generative vs. discriminative models

• Bias-variance tradeoff

An online learning algorithm

Perceptron weight update

• If y = 1, increase the weights for features in

• If y = -1, decrease the weights for features in

The perceptron

• A linear model for classification

• An algorithm to learn feature weights given

labeled data

– online algorithm

– error-driven

– Does it converge?

• See “A Course In Machine Learning” Ch.3

When to stop?

• One technique

– When the accuracy on held out data starts to

decrease

– Early stopping

Multiclass perceptron

Averaged Perceptron

What objective/loss does the

perceptron optimize?

• Zero-one loss function

• What are the pros and cons compared to

Naïve Bayes loss?

Perceptron & Probabilities

• What if we want a probability p(y|x)?

• The perceptron gives us a prediction y

The logistic function

• “Softer” function than in perceptron

• Can account for uncertainty

• Differentiable

Logistic regression: how to train?

• Train based on conditional likelihood

• Find parameters w that maximize conditional

likelihood of all answers 𝑦𝑖 given examples 𝑥𝑖

Stochastic gradient ascent

(or descent)• Online training algorithm for logistic regression

– and other probabilistic models

• Update weights for every training example• Move in direction given by gradient• Size of update step scaled by learning rate

Gradient of the logistic function

Example: initial update

Example: second update

How to set the learning rate?

• Various strategies

• decay over time

𝛼 =1

𝐶 + 𝑡

• Use held-out test set, increase learning rate

when likelihood increases

ParameterNumber of

samples

Multiclass version

Some models are

better then others…• Consider these 2 examples

• Which of the 2 models below is better?

Classifier 2 will probably generalize better!It does not include irrelevant information=> Smaller model is better

Regularization

• A penalty on adding extra

weights

• L2 regularization:

– big penalty on large

weights

– small penalty on small

weights

• L1 regularization:

– Uniform increase when

large or small

– Will cause many weights to

become zero

𝑤 2

𝑤 1

L1 regularization in online learning

What you should know

• Standard supervised learning set-up for text classification

– Difference between train vs. test data

– How to evaluate

• 3 examples of supervised linear classifiers

– Naïve Bayes, Perceptron, Logistic Regression

– Learning as optimization: what is the objective function

optimized?

– Difference between generative vs. discriminative classifiers

– Bias-Variance trade-off, smoothing, regularization

Supervised Classifiers: Naïve Bayes, Perceptron, Logistic ... · • Standard supervised learning...

Documents

16 midterm review - Wei Xu · Midterm Review Instructor: Wei Xu. Naïve Bayes, Logistic Regression (Log-linear Models), Perceptron, Gradient Descent

Deep Learning - Cross Entropycross-entropy.net/ML310/Deep_Learning.pdfDeep Neural Networks •More than one hidden layer •Supervised •Multi-Layer Perceptron (MLP) •Convolutional

Empirical Evaluation of Semi-Supervised Naïve Bayes for Active … · 2018. 7. 19. · Abstract This thesis describes an empirical evaluation of semi-supervised and active learning

Learning Linear Separators Perceptron, Marginsninamf/courses/315sp19/lectures/Perceptron-01-25-2019.pdfJan 25, 2019 · Perceptron Extensions • Can use Perceptron in a batch setting

Perceptron - SNU · •Hypothesis space of a perceptron (representation power of a perceptron) •The case where the threshold perceptron return 1 • defines a hyperplane in the

Artificial Neural Networks Part-2 · Perceptron Learning rule • Learning signal is the difference between the desired and actual neuron’s response • Learning is supervised

Linear Classification: The Perceptron - Penn … · Improving the Perceptron • The Perceptron produces many θ‘s during training • The standard Perceptron simply uses the final

30 years of adaptive neural networks: perceptron, Madaline ...isl.stanford.edu/~widrow/papers/j199030years.pdfcharacteristics, and basic theory of several supervised neural net- work

Linear Models: Naïve Bayes, Perceptron

Machine Learning Part I Machine Learning I –Machine Learning Introduction –Induction of Decision Trees –Perceptron and Simple Neural Network Learning –Naïve

Perceptron (actual lecture) · 2020. 4. 21. · Perceptron (actual lecture) Tuesday, April 21, 2020 13:15 Perceptron Page 1 . Perceptron Page 2 . Perceptron Page 3 . Perceptron Page

IA Perceptron

DATA MINING LECTURE 11 Classification Nearest Neighbor Classification Support Vector Machines Logistic Regression Naïve Bayes Classifier Supervised Learning

Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

Artificial Neural Networks and Single Layer Perceptron · 2019. 12. 9. · Single Layer Perceptron. Supervised Learning • Learning from correct answers Supervised Learning System

September 21, 2010Neural Networks Lecture 5: The Perceptron 1 Supervised Function Approximation In supervised learning, we train an ANN with a set of vector

CONFERENCE BOOKPeople's Face Image” Annisa Rositasari “Classifying Wastes Using Random Forests, Gaussian Naïve Bayes, Support Vector Machine and Multilayer Perceptron” Ibrahim

Introduction to Machine Learningrickl/courses/cs-171/2014... · • Different types of learning algorithms • Supervised learning – Decision trees – Naïve Bayes – Perceptrons,

Perceptron C++

Supervised Learning Networks. Linear perceptron networks Multi-layer perceptrons Mixture of experts Decision-based neural networks Hierarchical neural