1. MOTIVATION - UCSBdonganh/files/teaching/cs290d... · 2014. 5. 29. · Agenda 1. Mo9vaon(2. Approach(1. Sparse(Deep(Auto*encoder(2. Local(Recep9ve(Field(3. L2Pooling(4. Local(contrastnormalizaon(5

Building high-‐level features using large-‐scale unsupervised learning Anh Nguyen, Bay-‐yuan Hsu CS290D – Data Mining (Spring 2014) University of California, Santa Barbara Slide adapted from Andrew Ng (Stanford), Nando de Freitas (UBC) 1

Agenda 1.  Mo9va9on 2.  Approach 1.  Sparse Deep Auto-‐encoder 2.  Local Recep9ve Field 3.  L2 Pooling 4.  Local contrast normaliza9on 5.  Overall Model

3.  Parallelism 4.  Evalua9on 5.  Discussion

2

1. MOTIVATION

3

Mo9va9on

•  Feature learning •  Supervised learning

•  Need large number of labeled data • Unsupervised learning

•  Example: Build face detector without having labeled face images

• Building high-‐level features using unlabeled data.

4

Mo9va9on

• Previous works •  Auto encoder •  Sparse coding

• Result: Only learns low level features • Reason: Computa9onal constraints • Approach

•  Dataset • Model •  ComputaIonal resources 5

2. APPROACH

6

Sparse Deep Auto-‐encoder

• Auto-‐encoder •  Neural network •  Unsupervised learning •  Back-‐propaga9on

7

Sparse Deep Auto-‐encoder (cnt’d)

•  Sparse Coding •  Input: Images x(1), x(2) ... x(m) •  Learn: Bases (features) f1, f2, ..., fk, so that each input x can be approximately decomposed as: x=∑ajfj s.t. aj’s are mostly zero (“sparse”)

8


9


•  Sparse Coding • Regularizer

10


•  Sparse Deep Auto-‐encoder • Mul9ple hidden layers to achieve par9cular characteris9c in learning features

11

Local Recep9ve Field

• Defini9on: Each feature in the autoencoder can connect only to a small region of the lower layer

• Goal: •  Learn feature efficiently •  Parallelism

•  Training on small image patches

12

L2 Pooling

• Goal: Robust to local distorIon • Approach: Group similar features together to achieve invariance

13

L2 Pooling


14

L2 Pooling


15

L2 Pooling


16

Local Contrast Normaliza9on

• Goal: Robust to variaIon in light intensity • Approach: Normalize contrast

17

Local Contrast Normaliza9on

• Goal: Robust to variaIon in light intensity • Approach: Normalize contrast

18

Overall Model

•  3 layers •  Simple: 18x18 px

•  8 neurons/patch • Complex: 5x5 px •  LCN: 5x5 px

19

Overall Model

20

Overall Model

•  Train: •  Reconstruct input of each layer

• Op9miza9on func9on

21

Overall Model

•  Complex model?

22

3. PARALLELISM

23

Asynchronous SGD

n  Two recent lines of research in speeding up large learning problems: • Parallel/distributed compu9ng • Online (and mini-‐batch) learning algorithms: stochas9c gradient descent, perceptron, MIRA, stepwise EM n How can we bring together the benefits of parallel compu9ng and online learning?

24

Asynchronous SGD

n SGD: Stochas9c Gradient Descent: • Choose an ini9al vector of parameters W and learning rate α

• Repeat un9l an approximate minimum is obtained: •  Randomly shuffle examples in the training set

25

26

27

28

Model Parallelism

• Weights divided according to locality of image and store on different machine

29

5. EVALUATION

30

Evalua9on

• 10M Youtube unlabeled frames of size 200x200

• 1B parameters • 1000 machines • 16,000 cores

31

Experiment on Faces

• Test set •  37,000 images •  13,026 face images

• Best neuron

32

Experiment on Faces (cnt’d)

• Visualiza9on •  Top s9mulus (images) for face neuron • Op9mal s9mulus for face neuron

33


•  Invariances Proper9es

34


•  Invariances Proper9es

35

Experiment on Cat/Human body

• Test set • Cat: 10,000 posi9ve, 18,409 nega9ve • Human body: 13,026 posi9ve, 23,974 nega9ve

• Accuracy

36

ImageNet classifica9on

• Recognizing images • Dataset

•  20,000 categories •  14M images

• Accuracy •  15.8% •  State of art: 9.3%

37

5. DISCUSSION

38

Discussion

• Deep learning • Unsupervised feature learning •  Learning mul9ple layers of representa9on

•  Increase accuracy: Invariance, contrast normaliza9on

• Scalability

39

6. REFERENCES

40

References 1.  Quoc Le et al., “Building High-‐level Features using Large Scale Unsupervised

Learning” 2.  Nando de Freitas, “Deep Learning”, URL: hops://www.youtube.com/watch?

v=g4ZmJJWR34Q 3.  Andrew Ng, “Sparse autoencoder”, URL: hop://www.stanford.edu/class/archive/

cs/cs294a/cs294a.1104/sparseAutoencoder.pdf 4.  Andrew Ng, “Machine Learning and AI via Brain Simula9ons”, URL: hops://

forum.stanford.edu/events/2011slides/plenary/2011plenaryNg.pdf 5.  Andrew Ng, “Deep Learning”, URL: hop://www.ipam.ucla.edu/publica9ons/

gss2012/gss2012_10595.pdf

41

Documents

1. MOTIVATION - UCSBdonganh/files/teaching/cs290d... · 2014. 5. 29. · Agenda 1. Mo9vaon(2. Approach(1. Sparse(Deep(Auto*encoder(2. Local(Recep9ve(Field(3. L2Pooling(4. Local(contrastnormalizaon(5