Introduction to Machine Learning10701/slides/8_Perceptron.pdf · 2017. 9. 28. · Introduction to...

Introduction to Machine Learning

Perceptron

Barnabás Póczos

Contents

History of Artificial Neural Networks

Definitions: Perceptron, Multi-Layer Perceptron

Perceptron algorithm

Short History of Artificial Neural Networks

Progression (1943-1960)

• First mathematical model of neurons

▪ Pitts & McCulloch (1943)

• Beginning of artificial neural networks

• Perceptron, Rosenblatt (1958)

▪ A single neuron for classification

▪ Perceptron learning rule

▪ Perceptron convergence theorem

Degression (1960-1980)

• Perceptron can’t even learn the XOR function

• We don’t know how to train MLP

• 1963 Backpropagation… but not much attention…

Bryson, A.E.; W.F. Denham; S.E. Dreyfus. Optimal programming problems with inequality constraints. I: Necessary conditions for extremal solutions. AIAA J. 1, 11 (1963) 2544-2550

Short History

Progression (1980-)

• 1986 Backpropagation reinvented:

▪ Rumelhart, Hinton, Williams: Learning representations by back-propagating errors. Nature, 323, 533—536, 1986

• Successful applications:

▪ Character recognition, autonomous cars,…

• Open questions: Overfitting? Network structure? Neuron number? Layer number? Bad local minimum points? When to stop training?

• Hopfield nets (1982), Boltzmann machines,…

Short History

Degression (1993-)

• SVM: Vapnik and his co-workers developed the Support Vector Machine (1993). It is a shallow architecture.

• SVM and Graphical models almost kill the ANN research.

• Training deeper networks consistently yields poor results.

• Exception: deep convolutional neural networks, YannLeCun 1998. (discriminative model)

Short History

Deep Belief Networks (DBN)

• Hinton, G. E, Osindero, S., and Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18:1527-1554.

• Generative graphical model

• Based on restrictive Boltzmann machines

• Can be trained efficiently

Deep Autoencoder based networks

Bengio, Y., Lamblin, P., Popovici, P., Larochelle, H. (2007). Greedy Layer-Wise Training of Deep Networks, Advances in Neural Information Processing Systems 19

Convolutional neural networks running on GPUsAlex Krizhevsky, Ilya Sutskever, Geoffrey Hinton, Advances in Neural Information Processing Systems 2012

Progression (2006-)

The Neuron

– Each neuron has a body, axon, and many dendrites

– A neuron can fire or rest

– If the sum of weighted inputs larger than a threshold, then the neuron

fires.

– Synapses: The gap between the axon and other neuron’s dendrites. It

determines the weights in the sum.

The Mathematical Model of a Neuron

• Identity function

• Threshold function

(perceptron)

• Ramp function

Typical activation functions

• Logistic function

• Hyperbolic tangent function

• Rectified Linear Unit (ReLU)

• Exponential Linear Unit

• Softplus function(This is a smooth approximation of ReLU)

• Leaky ReLU

Structure of Neural Networks

Input neurons, Hidden neurons, Output neurons

Fully Connected Neural Network

Layers, Feedforward neural networks

Convention: The input layer is Layer 0.

• Multilayer perceptron: Connections only between Layer i and Layer i+1

• The most popular architecture.

Multilayer Perceptron

Recurrent Neural Networks

Recurrent NN: there are connections backwards too.

The Perceptron

The Training Set

The Perceptron

Matlab: opengl hardwarebasic, nnd4pr

Matlab demos: nnd3pc

The Perceptron Algorithm

The Perceptron algorithm

The perceptron learning algorithm

The perceptron algorithm

Observation

How can we remember this rule?

An interesting property:

we do not require the learning rate to go to zero!

Perceptron Convergence

Lemma Using this notation, the update rule can be written as

Perceptron Convergence

Lower bound

Upper bound

Therefore,

Upper bound

Therefore,

Take me home!

History of Neural Networks

Mathematical model of the neuron

Activation Functions

Perceptron definition

Perceptron algorithm

Perceptron Convergence Theorem

Introduction to Machine Learning10701/slides/8_Perceptron.pdf · 2017. 9. 28. · Introduction to...

Documents

B: Chapter 12 HTF: Chapter 14.5 Principal Component Analysisbapoczos/other_presentations/PCA_24_10... · Principal Component Analysis Barnabás Póczos University of Alberta Nov 24,

Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10. Self-training 11. Self-training Example Propagating 1-NN 12. Mixture Models for Labeled

HACKTIVITY · AMAN SACHDEV & HIMANSHU SHARMA | Zero to Millionaire in 60 Minutes: Hacking Real Life Financial Applications BARNABÁS SZTÁN-KOVÁCS | Notwork Security FRANTIŠEK STŘASÁK

Bártfai Barnabás Microsoft World 2007 Zsebkönyv

Introduction to Machine Learning10701/slides/10_Deep_Learning.pdf · 2017-10-04 · Introduction to Machine Learning Deep Learning Barnabás Póczos. 2 Credits Many of the pictures,

Entropy and Dependence Estimation Barnabás Póczos · 2009-07-28 · Entropy and Dependence Estimation Barnabás Póczos Department of Computing Science, ... Contents •Entropy

Barnabás Póczos - Carnegie Mellon School of Computer …aarti/Class/10701_Spring14/slides/kernel_methods.… · the test data as well ... Not a big surprise x=0 ... 2 2: 2: 2 2

Deep Sets - Neural Information Processing Systemspapers.nips.cc/paper/6931-deep-sets.pdf · Deep Sets Manzil Zaheer 1;2, Satwik Kottur , Siamak Ravanbhakhsh , Barnabás Póczos 1,

authors index - Conferences.hu · Fortuna, Luigi [037], [039], [392] Fosson, Sophie M. [009] Frasca, Mattia [037], [392] Frazzoli, Emilio [427] ... Garatti, Simone [167] Garay, Barnabás

Introduction to Machine Learning10701/slides/21_Undirected_Graphical_Models.pdf · Example: Image Denoising 12 Let us look at the example of noise removal from a binary image. The

Charging solutions Digital customer journey with ELMŰ ÉMÁSZ...Digital customer journey with ELMŰ-ÉMÁSZ Barnabás Pálfy | Budapest | 2020.03.10. 2 Charging solutions User experience

Approximate Ray-Tracing on the GPU with Distance Impostors László Szirmay-Kalos Barnabás Aszódi István Lazányi Mátyás Premecz TU Budapest, Hungary

Introduction to Machine Learning10701/slides/16_ICA.pdfICA weights (mixing matrix) help find location of sources ICA Application, Removing Artifacts from EEG. Fig from Jung 14 ICA

Introduction to Machine Learning - Alex Smola › ... › slides › 8_stochconvergence.pdfIntroduction to Machine Learning CMU-10701 8. Stochastic Convergence Barnabás Póczos Motivation

Introduction to Machine Learning CMU-10701aarti/Class/10701_Spring14/slides/DeepLearning.pdf · Introduction to Machine Learning CMU-10701 Deep Learning Barnabás Póczos & Aarti

Introduction to Machine Learning10701/slides/13_Active_Learning.pdf · Higher Dimensions Visualizing > 3 dimensional Gaussian random . variables is… difficult Means and variances

Online Group-Structured Dictionary Learningbapoczos/articles/szabo11cvpr.pdfOnline Group-Structured Dictionary Learning Zoltán Szabó1 Barnabás Póczos2 András Lorincz˝ 1 1School

Reinforcement Learning - Carnegie Mellon School …mgormley/courses/10601-s17/...Introduction to Machine Learning Reinforcement Learning Barnabás Póczos TexPoint fonts used in EMF

The Effect of Anti-fracking Movements on European Shale ...The Effect of Anti-fracking Movements on European Shale Gas Policies: Poland and France By Barnabás Áron Kádár ... Timothé

Principal Component Analysis Barnabás Póczos University of Alberta Nov 24, 2009 B: Chapter 12 HRF: Chapter 14.5