Artificial Neural Networks: Introguerzhoy/411/lec/W03/neuralnetworks.pdfArtificial Neural Networks:...

Artificial Neural Networks: Intro

CSC411: Machine Learning and Data Mining, Winter 2017

Michael Guerzhoy

“Making Connections” by Filomena Booth (2013)

Slides from Andrew Ng, Geoffrey Hinton, and Tom Mitchell

Non-Linear Decision Surfaces

• There is no linear decision boundary

Car Classification

Testing:

What is this?

Not a carCars

You see this:

But the camera sees this:

Learning Algorithm

pixel 1

pixel 2

pixel 1

pixel 2

Raw image

Cars“Non”-Cars

pixel 1

pixel 2

Raw image

Cars“Non”-Cars

Learning Algorithm

pixel 1

pixel 2

pixel 1

pixel 2

Raw image

Cars“Non”-Cars

50 x 50 pixel images→ 2500 pixels(7500 if RGB)

pixel 1 intensity

pixel 2 intensity

pixel 2500 intensity

Quadratic features ( ): ≈3 millionfeatures

Learning Algorithm

pixel 1

pixel 2

Simple Non-Linear Classification Example

Linear Neuron

𝑤1 𝑤2 𝑤3

Linear neuron

𝑤0 + 𝑤1𝑥1 + 𝑤2𝑥2 + 𝑤3𝑥3

𝑥1 𝑥2 𝑥3

Linear Neuron: Cost Function

• Any number of choices. The one made for linear regression is

σ𝑖=1𝑚 𝑦 𝑖 − 𝑤𝑇𝑥(𝑖)

• Can minimize using gradient descent to obtain the best weights w for the training set

Logistic Neuron

𝑤1 𝑤2 𝑤3

𝜎(𝑤0 + 𝑤1𝑥1 + 𝑤2𝑥2 +𝑤3𝑥3), 𝜎 𝑡 =1

1 + exp(−𝑡)

𝑥1 𝑥2 𝑥3

Logistic Neuron: Cost Function

• Could use the quadratic cost function again

• Could use the “log-loss” function to make the neuron perform logistic regression

𝑖=1

𝑦 𝑖 log1

1 + exp −𝑤𝑇𝑥 𝑖+ (1 − 𝑦 𝑖 ) log

exp −𝑤𝑇𝑥 𝑖

1 + exp −𝑤𝑇𝑥 𝑖

(Note: we derived this cost function by saying we want to maximize the likelihood of the data under a certain model, but there’s nothing stopping us from just making up a loss function)

Logistic Regression Cost Function: Another Look

• 𝐶𝑜𝑠𝑡 ℎ𝑤 𝑥 , 𝑦 = ቐ− log ℎ𝑤 𝑥 , 𝑦 = 1

− log 1 − ℎ𝑤 𝑥 , 𝑦 = 0

• If y = 1, want the cost to be small if ℎ𝑤 𝑥 is close to 1 and large if ℎ𝑤 𝑥 is close to 0• -log(t) is 0 for t=1 and infinity for t = 0

• If y = 0, want the cost to be small if ℎ𝑤 𝑥 is close to 0 and large if ℎ𝑤 𝑥 is close to 1

• Note:0 < 𝜎 𝑡 < 1

𝜎 𝑡 =1

1 + exp(−𝑡)13

Multilayer Neural Networks

• ℎ𝑖,𝑗 = g W𝑖,𝑗𝑥

= 𝑔(

𝑊𝑖,𝑗,𝑘𝑥𝑘)

• 𝑥0 = 1 always

• 𝑊𝑖,𝑗,0 is the “bias”

• g is the activation function• Could be 𝑔 𝑡 = 𝑡• Could be 𝑔 𝑡 = 𝜎 𝑡

• Nobody uses those anymore…

output units

input units

hidden units

ℎ𝑖,𝑗

𝑥2 𝑥3

𝑜1 𝑜2

Multilayer Neural Network: Speech Recognition Example

How to compute AND?

How to compute XOR?

Artificial Neural Networks: Introguerzhoy/411/lec/W03/neuralnetworks.pdfArtificial Neural Networks:...

Documents

Neural Networks Tutorialurtasun/courses/CSC411... · by just creating extra training data: – for each training image, produce new training data by applying all of the transformations

CNS DEVELOPMENT. Stages in Neural Tube Development Neural plate. Neural plate. Neural folds. Neural folds. Neural tube. Neural tube

CSC411 Tutorial #5 Neural Networksfidler/teaching/2015/slides/CSC411/tutorial5... · CSC411 Tutorial #5 Neural Networks February 26, 2016 Boris Ivanovic* csc411ta@cs.toronto.edu *Based

CSC411/2515 Tutorial: K-NN and Decision Treejlucas/teaching/csc411/lectures/tut3... · CSC411/2515 Tutorial: K-NN and Decision Tree Aryan Arbabi csc411-20179-ta@cs.toronto.edu September

How ConvNets See + Applicationsguerzhoy/321/lec/W07/HowConvNetsSee.pdfUnderstanding How ConvNets See CSC321: Intro to Machine Learning and Neural Networks, Winter 2016 Michael Guerzhoy

Recurrent Neural Networks (RNN)guerzhoy/321/lec/W08/rnn.pdf · RNN for Language Modelling •Given a list of word vectors (e.g., one-hot encodings of words) 1,…, 𝑡−1, 𝑡,

CSC411/2515 Lecture 2: Nearest Neighbors › ... › slides › lec02-slides.pdf · CSC411/2515 Lecture 2: Nearest Neighbors Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla

CSC 411 Lecture 10: Neural Networks I · XJ j=1 x jw kj Logistic regression! CSC411 Lec10 17 / 41. Feedforward network Feedforward network - Connections are adirected acyclic graphs

Principal Component Analysis (PCA)Principal Component Analysis (PCA) CSC411/2515: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang Salvador Dalí, “Galatea

CSC411/2515 Tutorial: K-NN and Decision Treeurtasun/courses/CSC411_Fall16/tutorial… · CSC411/2515 Tutorial: K-NN and Decision Tree Mengye Ren cscf411,2515gta@cs.toronto.edu September

CSC411 A Bit of Neuroscienceguerzhoy/411/lec/W06/neuroscience.pdf · A Bit About Neuroscience CSC411: Machine Learning and Data Mining, Winter 2017 Michael Guerzhoy Slides from Geoffrey

Welcome to CSC411 - Department of Computer Science ...guerzhoy/411/lec/W01/intro.pdf · Welcome to CSC411 CSC411: Machine Learning and Data Mining, Winter 2017 Michael Guerzhoy Y

René Magritte, “Decalcomania”guerzhoy/320/lec/edgedetection.pdf · CSC320: Introduction to Visual Computing Many slides from Michael Guerzhoy Derek Hoiem, Robert Collins Edge

PCA, Eigenfaces, and Face Detection - University of …guerzhoy/320/lec/pca.pdfNoah Snavely, Derek Hoeim, Robert Collins PCA, Eigenfaces, and Face Detection Salvador Dalí, “Galatea

Neural Networks Tutorial - cs.toronto.eduurtasun/courses/CSC411/tutorial5.pdf · CSC411/2515 Fall 2015 Neural Networks Tutorial Yujia Li Oct. 2015 Slides adapted from Prof. Zemel’s

Midterm Review - University of Torontorsalakhu/CSC411/notes/Review.pdfMidterm Review • Multivariate Gaussian distributions, Multivariate Gaussian Naïve Bayes, classifier, definition

CSC411 RL and Policy Gradients - Department of Computer ...guerzhoy/411/lec/W10/rl.pdf · •Reward: 0 if the game hasn’t ended, 1 if the agent wins, -1 if the opponent wins •Action:

Neural Networks Tutorialurtasun/courses/CSC411/tutorial5.pdfOverﬁng ) Picture credit: Chris Bishop. Pattern Recognition and Machine Learning. Ch.1.1

Artificial Neural NetworksArtificial Neural Networksrailway.iust.ac.ir/files/rail/Booklet_Teach/neural_network_moaveni.pdf · Artificial Neural NetworksArtificial Neural Networks

CSC411Fall2014 MachineLearning&DataMining% …rsalakhu/CSC411/notes/lecture_ensemble1.pdfDiffer(in(training(strategy,(and(combination(method(1. Paralleltrainingwithdifferenttrainingsets: