From Logistic Regression to Neural Networks

From Logistic Regressionto Neural Networks

CMSC 470

Marine Carpuat

Logistic RegressionWhat you should know

How to make a prediction with logistic regression classifier

How to train a logistic regression classifier

Machine learning concepts:

Loss function

Gradient Descent Algorithm

SGD hyperparameter:the learning rate• The hyperparameter 𝜂 that control the size of the step down the

gradient is called the learning rate

• If 𝜂 is too large, training might not converge; if 𝜂 is too small, training might be very slow.

• How to set the learning rate? Common strategies:• decay over time: 𝜂 =

𝐶+𝑡

• Use held-out test set, increase learning rate when likelihood increases

Constant hyperparameter

set by user

Number of samples

Multiclass Logistic Regression

Formalizing classification

Task definition

• Given inputs:• an example x

often x is a D-dimensional vector of binary or real values

• a fixed set of classes Y

Y = {y1, y2,…, yJ}

e.g. word senses from WordNet

• Output: a predicted class y Y

Classifier definition

A function g: x g(x) = y

Many different types of functions/classifiers can be defined

• We’ll talk about perceptron, logistic regression, neural networks.

So far we’ve only worked with binary classification

problems i.e. J = 2

A multiclass logistic regression classifier

aka multinomial logistic regression, softmax logistic regression, maximum entropy (or maxent) classifier

Goal: predict probability P(y=c|x), where c is one of k classes in set C

The softmax function

• A generalization of the sigmoid

• Input: a vector z of dimensionality k

• Output: a vector of dimensionality k

Looks like a probability distribution!

The softmax functionExample

All values are in [0,1] and sum up to 1: they can be interpreted as probabilities!

A multiclass logistic regression classifier

aka multinomial logistic regression, softmax logistic regression, maximum entropy (or maxent) classifier

Goal: predict probability P(y=c|x), where c is one of k classes in set C

Model definition: We now have one weight vector and

one bias PER CLASS

Features in multiclass logistic regression

• Features are a function of the input example and of a candidate output class c

• represents feature i for a particular class c for a given example x

Example: sentiment analysis with 3 classes {positive (+), negative (-), neutral (0)}• Starting from the features for binary classification

• We create one copy of each feature per class

Learning in Multiclass Logistic Regression

• Loss function for a single example

1{ } is an indicator function that evaluates to1 if the condition in the brackets is true, and

to 0 otherwise

• Loss function for a single example

Logistic RegressionWhat you should knowHow to make a prediction with logistic regression classifier

How to train a logistic regression classifier

For both binary and multiclass problems

Machine learning concepts:

Loss function

Gradient Descent Algorithm

Learning rate

Neural Networks

From logistic regressionto a neural network unit

Limitation of perceptron

● can only find linear separations between positive and negative examples

Example: binary classification with a neural network

● Create two classifiers

φ0(x2) = {1, 1}φ

0(x1) = {-1, 1}

φ0(x4) = {1, -1}φ

0(x3) = {-1, -1}

φ0[0]

φ0[1]

φ0[0]

φ0[1]φ

φ0[0]

φ0[1]

φ1[1]

● These classifiers map to a new space

φ0(x2) = {1, 1}φ

0(x1) = {-1, 1}

φ0(x4) = {1, -1}φ

0(x3) = {-1, -1}

φ1[1]

φ1[0]

φ1[1]

φ1(x1) = {-1, -1}

1(x2) = {1, -1}

φ1(x3) = {-1, 1}

φ1(x4) = {-1, -1}

φ0(x2) = {1, 1}φ

0(x1) = {-1, 1}

φ0(x4) = {1, -1}φ

0(x3) = {-1, -1}

φ0[0]

φ0[1]

φ1[1]

φ1[0]

φ1[1]

φ1(x1) = {-1, -1}

X O φ1(x2) = {1, -1}

Oφ1(x3) = {-1, 1}

φ1(x4) = {-1, -1}

2[0] = y

Example: the final network can correctly classify the examples that the perceptron could not.

φ0[0]

φ0[1]

φ0[0]

φ0[1]

φ1[0]

φ1[1]

φ2[0]

Replace “sign” withsmoother non-linear function

(e.g. tanh, sigmoid)

Feedforward Neural Networks

Components:

• an input layer

• an output layer

• one or more hidden layers

In a fully connected network: each hidden unit takes as input all the units in the previous layer

No loops!

A 2-layer feedforward neural network

Designing Neural Networks:Activation functions • Hidden layer can be viewed as set of

hidden features

• The output of the hidden layer indicates the extent to which each hidden feature is “activated” by a given input

• The activation function is a non-linear function that determines range of hidden feature values

Designing Neural Networks:Activation functions

Designing Neural Networks:Network structure

• 2 key decisions:• Width (number of nodes per layer)

• Depth (number of hidden layers)

• More parameters means that the network can learn more complex functions of the input

Forward Propagation: For a given network, and some input values, compute output

Given input (1,0) (and sigmoid non-linearities), we can calculate the output by processing one layer at a time:

Forward Propagation: For a given network, and some input values, compute output

Output table for all possible inputs:

Neural Networks as Computation Graphs

Computation Graphs Make Prediction Easy:Forward Propagation consists in traversing graph in topological order

Neural Networks so far

• Powerful non-linear models for classification

• Predictions are made as a sequence of simple operations• matrix-vector operations• non-linear activation functions

• Choices in network structure• Width and depth• Choice of activation function

• Feedforward networks• no loop

• Next: how to train

From Logistic Regression to Neural Networks

Documents

Logistic Regression and Discriminant Analysis · 2018-04-16 · Discriminant Analysis? Logistic Regression . Logistic Regression •Logistic regression builds a predictive model for

Introduction to Statistical Learning - Cross Entropycross-entropy.net/ML210/Introduction.pdf · regression, polynomial regression, logistic regression, neural network •Non-Parametric

Logistic Regression Using SPSS - sites.education.miami.edu...Logistic Regression Using SPSS Overview Logistic Regression - Logistic regression is used to predict a categorical (usually

Logistic Regression - cs.wellesley.educs.wellesley.edu/~cs305/lectures/6_Logistic_Regression.pdfLogistic Regression Logistic regression is used for classification, not regression!

Logistic Regression - krishanpandey.com Regression.pdfCHAPTER 22. LOGISTIC REGRESSION 22.1 Introduction 22.1.1 Difference between standard and logistic regression In regular multiple-regression

Lecture 20 - Logistic Regression - Statistical Science · Logistic Regression Logistic Regression Logistic regression is a GLM used to model a binary categorical variable using numerical

Neural Networks - An Introduction · 2017-03-16 · Neural Networks - An Introduction Author: Warith HARCHAOUI Subject: Neural Networks, Convolutional, Adversarial, Logistic Regression,

Comparison between neural networks and multiple logistic … · Comparison between neural networks and multiple logistic regression to predict acute coronary syndrome in the emergency

Introduction to Logistic Regression Modeling - minitab.com · Logistic Regression will estimate binary (Cox (1970)) and multinomial (Anderson (1972)) logistic models. Logistic Regression

Regression analysis Linear regression Logistic regression

Logistic Regression - svivek · Logistic Regression is the discriminative version. This lecture •Logistic regression •Connection to Naïve Bayes •Training a logistic regression

Binary Logistic Regression Multinomial Logistic Regressionmgormley/courses/10601/slides/lecture10-multi.pdf · Binary Logistic Regression + Multinomial Logistic Regression 1 10-601

Logistic Regression for Distribution Modeling - CLAS Usersusers.clas.ufl.edu/.../logistic-regression_modeling.pdf · Logistic Regression for Distribution Modeling ... logistic regression

Using Neural Network and Logistic Regression Analysis to ... · university students’ enrollment data. For predictive analysis, three techniques were used: neural network, logistic

Logistic Regression€¦ · Logistic Regression • Combine with linear regression to obtain logistic regression approach: • Learn best weights in • • We know interpret this

Neural Networks. - cvut.cz · Relations to neural networks Intro •Notation •Multiple regression •Logistic regression •Question •Gradient descent •Ex: Grad. for MR •Ex:

And Logistic Regression Linear and Logistic Regression

Predictive ability of logistic regression, auto-logistic ... · Predictive ability of logistic regression, auto-logistic regression and neural network models in empirical land-use

Megha Byali, Harsh Chaudhari*, Arpita Patra, and Ajith ... · Linear Regression, Logistic Regression, Deep Neural Networks, and Binarized Neural Networks. We sub-stantiate our theoretical

deep learning - BGU · 2018-12-25 · Deep Learning • Logistic Regression • Neural Network • Autoencoders Example • Recursive Neural Network Example