Advanced Topics - University of Notre Damerjohns15/cse40647.sp14/www... · Machine Learning and AI...

Preview:

Citation preview

Advanced Topics

Data Preprocessing

Advanced Topics

Last class

• Neural Networks

2

Data Preprocessing

Advanced Topics

Popularity and Applications

• Major increase in popularity of Neural Networks

• Google developed a couple of efficient methods that allow for the training of huge deep NN – Asynchronous Distributed Gradient Descent

– L-BFGS

• These have recently been made available to the public

3

Data Preprocessing

Advanced Topics

Large scale ANN

4

Model

Training Data

Machine Learning and AI via Brain simulations - Andrew Ng

Data Preprocessing

Advanced Topics

Popularity and Applications

5

Parameter Server

Model

Workers

Data Shards

Machine Learning and AI via Brain simulations - Andrew Ng

Parameter Server

Model

Workers

Data

Coordinator

(small messages)

L-BFGS Asynchronous Distributed Gradient Descent

• 20,000 cores in a single cluster

• up to 1 billion data items / mega-batch (in ~1 hour)

Data Preprocessing

Advanced Topics

Neural Networks

6 Machine Learning and AI via Brain simulations - Andrew Ng

Data Preprocessing

Advanced Topics

Popularity and Applications

7 Machine Learning and AI via Brain simulations - Andrew Ng

Data Preprocessing

Advanced Topics

Popularity and Applications

8 Machine Learning and AI via Brain simulations - Andrew Ng

Data Preprocessing

Advanced Topics

Learning from Unlabeled Data

9 Machine Learning and AI via Brain simulations - Andrew Ng

[Banko & Brill, 2001]

Training set size (millions)

A

ccu

racy

“It’s not who has the best algorithm that wins. It’s who has the most data.”

Data Preprocessing

Advanced Topics

10 Machine Learning and AI via Brain simulations - Andrew Ng

Preliminaries Data Understanding

Data Preprocessing

Data Modeling Validation & Interpretation

Advanced Topics

Data Preprocessing

Advanced Topics

Feature Learning

A set of techniques in machine learning that learn a transformation of "raw" inputs to a representation that can be effectively exploited in a supervised learning task

such as classification

11

Data Preprocessing

Advanced Topics

Feature Learning

• Can be supervised or unsupervised

• Example of algorithms: – Autoencoders

– Matrix Factorization

– Restricted Boltzmann machine

– Clustering

– Neural Networks

12

Data Preprocessing

Advanced Topics

Feature Learning

13

• Each k-centroid can be used to represent a feature for supervised learning

• Each feature 𝑗 has value 1 iff the 𝑗𝑡ℎ centroid learned by k-means is the closest to the instance under consideration

Data Preprocessing

Advanced Topics

Deep Learning

14

• The motivation: Some data representations make it easier to learn particular tasks (e.g., image classification)

• Ex.: Our assignment 2: – It’s hard to give your computer an image and ask “what season does it represent?” but if we can

apply transformations that describe similar images with a simpler representation, we might accomplish that task

Andrew Ng

• Machine learning algorithms to model high-level data abstractions using a series of transformations

• Based on feature learning • The motivation: Some data representations

make it easier to learn particular tasks (e.g., image classification)

Data Preprocessing

Advanced Topics

Self-Taught Learning (Unsupervised Feature Learning)

15 Machine Learning and AI via Brain simulations - Andrew Ng

Testing:

What is this?

Not motorcycles

Unlabeled images …

Motorcycles

Data Preprocessing

Advanced Topics

• Supervised learning

• Semi-supervised learning

• Self-taught learning (unsupervised feature learning)

Cars Motorcycles

Random images

Car Motorcycle

Cars Motorcycles

Unlabeled cars & motorcycles

Data Preprocessing

Advanced Topics

Self-Taught Learning

• One can always try to get more labeled data, but this can be expensive – Amazon Mechanical Turk

– Expert feature engineering

• The promise of self-taught learning and unsupervised feature learning: – If we can get our algorithms to learn from unlabeled data, then we can

easily obtain and learn from massive amounts of it

• 1 instance of unlabeled data < 1 instance of labeled data • Billions of instances of unlabeled data >> some labeled data

17

Data Preprocessing

Advanced Topics

Self-Taught Learning

• The idea: – Give the algorithm a large amount of unlabeled data

– The algorithm learns a feature representation of that data

– If the end goal is to perform classification, one can find a small set of labeled instances to probe the model and adapt it to the supervised task

18

Data Preprocessing

Advanced Topics

Autoencoders

• An artificial neural network used for learning efficient codings

• Can be used for dimensionality reduction

• Similar to a Multilayer Perceptron with an input layer and one or more hidden layers

• The difference between autoencoders and MLP: – Autoencoders have the same number of inputs and outputs

– Instead of predict y, autoencoders try to reconstruct x

19

Data Preprocessing

Advanced Topics

Autoencoders

20

If the hidden layers are narrower than input layer, then the activations of the final layers can be regarded as compressed representation of the input

Data Preprocessing

Advanced Topics

Self-Taught Learning in Practice

21

• Suppose we have an unlabeled training set

*𝑥𝑢1

, 𝑥𝑢2

, … , 𝑥𝑢𝑚𝑢 + with 𝑚𝑢 unlabeled instances

• Step 1: Train an autoencoder on this data

Data Preprocessing

Advanced Topics

Self-Taught Learning in Practice

22

• After step 1, we will have learned all weight parameters for the network

• We can also visualize the algorithm for computing the features/activations 𝑎 s the following neural network:

Data Preprocessing

Advanced Topics

Self-Taught Learning in Practice

23

• Step 2:

– Training set: 𝑥𝑙1

, 𝑦 1 , 𝑥𝑙2

, 𝑦 2 , … , (𝑥𝑙𝑚𝑙 , 𝑦(𝑚𝑙)) of 𝑚𝑙 labeled

examples

– Feed training example 𝑥𝑙1

to the autoencoder and obtain its

corresponding vector of activations 𝑎𝑙(1)

– Repeat that for all training examples

Data Preprocessing

Advanced Topics

Self-Taught Learning in Practice

24

• Step 3:

– Replace the original feature with 𝑎𝑙(1)

– The training set then becomes:

• 𝑎𝑙1

, 𝑦 1 , 𝑎𝑙2

, 𝑦 2 , … , (𝑎𝑙𝑚𝑙 , 𝑦(𝑚𝑙))

• Step 4: – Train a supervised algorithm using this new training set to obtain a

function that makes predictions on the y values

Data Preprocessing

Advanced Topics

An Example of Self-Taught Learning

25 Machine Learning and AI via Brain simulations - Andrew Ng

• What Google did:

– Trained on 10 million images (YouTube)

– 1000 machines (16,000 cores) for 1 week.

– 1.15 billion parameters

– Test on novel images

Training set (YouTube) Test set (FITW + ImageNet)

Data Preprocessing

Advanced Topics

Result Highlights

26 Machine Learning and AI via Brain simulations - Andrew Ng

• The face neuron

Top stimuli from the test set Optimal stimulus by numerical optimization

Data Preprocessing

Advanced Topics

Result Highlights

27 Machine Learning and AI via Brain simulations - Andrew Ng

• The face neuron

Feature value

Random distractors

Faces

Frequency

Faces

Random distractors

Data Preprocessing

Advanced Topics

Result Highlights

28 Machine Learning and AI via Brain simulations - Andrew Ng

• The cat neuron

Optimal stimulus by numerical optimization

Data Preprocessing

Advanced Topics

Result Highlights

29 Machine Learning and AI via Brain simulations - Andrew Ng

Data Preprocessing

Advanced Topics

Best stimuli

Pooling Size = 5

Number

of maps = 8

Image Size = 200

Number of output

channels = 8

Number of input

channels = 3

On

e la

yer

RF size = 18

Input to another layer above

(image with 8 channels)

W

H

LCN Size = 5

Feature 1

Feature 2

Feature 3

Feature 4

Feature 5

Le, et al., Building high-level features using large-scale unsupervised learning. ICML 2012

Data Preprocessing

Advanced Topics

Pooling Size = 5

Number

of maps = 8

Image Size = 200

Number of output

channels = 8

Number of input

channels = 3

On

e la

yer

RF size = 18

Input to another layer above

(image with 8 channels)

W

H

LCN Size = 5

Feature 7

Feature 8

Feature 6

Feature 9

Best stimuli

Le, et al., Building high-level features using large-scale unsupervised learning. ICML 2012

Data Preprocessing

Advanced Topics

Pooling Size = 5

Number

of maps = 8

Image Size = 200

Number of output

channels = 8

Number of input

channels = 3

On

e la

yer

RF size = 18

Input to another layer above

(image with 8 channels)

W

H

LCN Size = 5

Feature 11

Feature 10

Feature 12

Feature 13

Best stimuli

Le, et al., Building high-level features using large-scale unsupervised learning. ICML 2012

Recommended