Intro to Neural Networksmorse.uml.edu/Activities.d/CDM/docs/neuralnets.pdf · Intro to Neural Networks Slides by Andrew Ng . Andrew Ng Neural Networks Origins: Algorithms that try

Intro to Neural Networks

Prof. Kate Saenko, [email protected]

Today

• Neural Networks

– aka Multilayer Perceptrons,

– aka Deep Belief Networks/Deep Learning

• Unsupervised Neural Networks

– Autoencoders

– Sparse Autoencoders

Intro to Neural Networks

Slides by Andrew Ng

Andrew Ng

Neural Networks

Origins: Algorithms that try to mimic the brain. Was very widely used in 80s and early 90s; popularity diminished in late 90s. Recent resurgence: State-of-the-art technique for many applications

Andrew Ng

[Roe et al., 1992]

Auditory cortex learns to see

Auditory Cortex

The “one learning algorithm” hypothesis

[Roe et al., 1992]

Andrew Ng

Somatosensory cortex learns to see

Somatosensory Cortex

The “one learning algorithm” hypothesis

[Metin & Frost, 1989]

Andrew Ng

Seeing with your tongue Human echolocation (sonar)

Haptic belt: Direction sense Implanting a 3rd eye

Sensor representations in the brain

[BrainPort; Welsh & Blasch, 1997; Nagel et al., 2005; Constantine-Paton & Law, 2009]

Andrew Ng

Neuron in the brain

Andrew Ng

Neurons in the brain

[Credit: US National Institutes of Health, National Institute on Aging]

Andrew Ng

Neuron model: Logistic unit

Sigmoid (logistic) activation function.

Andrew Ng

Sigmoid function Logistic function

One Unit is a Logistic Regression Classifier

Want

1

0.5

0

1

0

Andrew Ng

Interpretation of Hypothesis Output

= estimated probability that y = 1 on input x

Tell patient that 70% chance of tumor being malignant

Example: If

“probability that y = 1, given x, parameterized by ”

Andrew Ng

Neural Network

Layer 3 Layer 1 Layer 2

Andrew Ng

Neural Network “activation” of unit in layer

matrix of weights controlling function mapping from layer to layer

If network has units in layer , units in layer , then will be of dimension .

Andrew Ng

Add .

Forward propagation: Vectorized implementation

Andrew Ng


Neural Network learning its own features

Andrew Ng


Other network architectures

Layer 4

Andrew Ng

Handwritten digit classification

[Courtesy of Yann LeCun]

Andrew Ng

Handwritten digit classification

[Courtesy of Yann LeCun]

Andrew Ng

Neural Network (Classification)

Binary classification 1 output unit

Layer 1 Layer 2 Layer 3 Layer 4

Multi-class classification (K classes)

K output units

total no. of layers in network

no. of units (not counting bias unit) in layer

pedestrian car motorcycle truck

E.g. , , ,

Andrew Ng

Cost function

Neural network:

regularization

training error

Andrew Ng

Training set:

How to choose parameters ?

m examples

Cost function: Training Error

Andrew Ng

Linear regression:

“non-convex” “convex”


Andrew Ng

If y = 1

1 0


Andrew Ng

If y = 0

1 0


Andrew Ng


Andrew Ng

Output

To fit parameters :

To make a prediction given new :


Andrew Ng

Cost function

Neural network:

regularization

training error

Gradient computation

Need code to compute: -

-

Use “Backpropagation algorithm” - Efficient way to compute

- Computes gradient

incrementally by “propagating” backwards through the network

Unsupervised Neural Networks

Unsupervised Learning

• So far, we talked about classifiation

• Can also apply algorithm to regression, or

• unsupervised learning, by setting outputs to inputs – Autoencoder

– Sparse Autoencoder

– Restricted Bolzmann Machine (RBM)

autoencoder

Autoencoder

• The identity function seems trivial, but…

• …by putting constraints on the hidden units, we can discover interesting structure – Use few units, enforce sparsity, etc.

• Example: – Learn autoencoder on 10x10 pixel patches

– Use only 50 hidden nodes

– Forces hidden nodes to “compress” data

• Similar to PCA, sparse coding, LLE, etc.

Sparse Autoencoder

• Think of a neuron as being "active" (or as "firing") if its output value is close to 1, or as being "inactive" if its output value is close to 0.

• We would like to constrain the neurons to be inactive most of the time

• Same idea as Sparse Coding! But different method

Sparse Autoencoder

• we will write to denote the activation of this hidden unit when the network is given a specific input x

• the average activation of hidden unit (averaged over the training set) is

• We would like to (approximately) enforce the constraint

where is a sparsity parameter, typically a small value close to zero (say ).

RECALL: Gradient computation

Need code to compute: -

-

Use “Backpropagation algorithm” - Efficient way to compute

- Computes gradient

incrementally by “propagating” backwards through the network

+ Sparsity penalty

regularization

training error

+

Sparse Autoencoder

http://deeplearning.stanford.edu/wiki/index.php/Visualizing_a_Trained_Autoencoder

Sparse Autoencoder • Visualizing a sparse Autoencoder trained on 10x10

image patches

http://deeplearning.stanford.edu/wiki/index.php/Visualizing_a_Trained_Autoencoder

Documents

Intro to Neural Networksmorse.uml.edu/Activities.d/CDM/docs/neuralnets.pdf · Intro to Neural Networks Slides by Andrew Ng . Andrew Ng Neural Networks Origins: Algorithms that try