17
Learning and Memorization Sat Chatterjee ( [email protected]) 1 1 This work was done at Two Sigma Investments. 2 2 Yes, was not quite my day job. ICML 2018

Learning and Memorization - EECS at UC Berkeleyalanmi/... · 1. Pure memorization can lead to generalization 2. This model replicates some interesting features of neural networks:

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Learning and Memorization - EECS at UC Berkeleyalanmi/... · 1. Pure memorization can lead to generalization 2. This model replicates some interesting features of neural networks:

Learning and MemorizationSat Chatterjee ([email protected])1

1This work was done at Two Sigma Investments.2 2Yes, was not quite my day job.

ICML 2018

Page 2: Learning and Memorization - EECS at UC Berkeleyalanmi/... · 1. Pure memorization can lead to generalization 2. This model replicates some interesting features of neural networks:

Learning and MemorizationSat Chatterjee

Motivation

2

Neural Networks can memorize large amounts of random data

Since it is believed that memorization and generalization are incompatible ...

If nets can memorize random data why do they generalize on real data?

Understanding deep learning requires rethinking generalization.

Zhang et al. ICLR `17. In

cept

ion

mod

el

Page 3: Learning and Memorization - EECS at UC Berkeleyalanmi/... · 1. Pure memorization can lead to generalization 2. This model replicates some interesting features of neural networks:

Learning and MemorizationSat Chatterjee

One Possibility

Perhaps networks do different things with random data than with real data.

3

There are qualitative differences in DNN optimization behavior on real data vs. noise.

A Closer Look at Memorization in Deep Networks. Arpit et al. ICML `17.

The “hardness” of training examples has different distributions between random and real

Page 4: Learning and Memorization - EECS at UC Berkeleyalanmi/... · 1. Pure memorization can lead to generalization 2. This model replicates some interesting features of neural networks:

Learning and MemorizationSat Chatterjee

Another Possibility

So, if they memorize random data, is it possible that they also memorize real data?

4

Perhaps networks do the same thing with random data as with real data.

But then how do they generalize? After all aren’t memorization and generalization at odds?

Can memorizationalonelead to generalization?

Page 5: Learning and Memorization - EECS at UC Berkeleyalanmi/... · 1. Pure memorization can lead to generalization 2. This model replicates some interesting features of neural networks:

Learning and MemorizationSat Chatterjee

Let’s Make This Concrete

Consider this learning task (Binary MNIST):

Each pixel is quantized to 1 bitSeparate ‘0’-‘4’ from ‘5’-‘9’ (binary classification)

5

We want to see if memorization can lead to generalization

1. A giant lookup table but does not generalize(maps a 28x28-wide bit-vector to 0 or 1)

2. Nearest neighbor but needs a distance (metric)

class 0 class 1

Page 6: Learning and Memorization - EECS at UC Berkeleyalanmi/... · 1. Pure memorization can lead to generalization 2. This model replicates some interesting features of neural networks:

Learning and MemorizationSat Chatterjee

Ok, so what can we do?

Instead of a single large lookup table

build a network of small lookup tables

Each lookup table (lut) is connected to k random outputs from previous layer(i.e. maps a k-wide bit-vector to 0 or 1)

k is typically less than 16

6

example with k = 2 luts

Input layer

First layer of look up tables

Second layer of look up tables

output

Page 7: Learning and Memorization - EECS at UC Berkeleyalanmi/... · 1. Pure memorization can lead to generalization 2. This model replicates some interesting features of neural networks:

Learning and MemorizationSat Chatterjee

So, does this work? Surprisingly, yes!

7

5 hidden layers with 1024 luts in each layer

Each lut has k = 8 inputs from previous layer

training accuracy = 0.89test accuracy = 0.87

random chance = 0.50

Robust to randomness in topology

Page 8: Learning and Memorization - EECS at UC Berkeleyalanmi/... · 1. Pure memorization can lead to generalization 2. This model replicates some interesting features of neural networks:

Learning and MemorizationSat Chatterjee

What happens as we vary the lut size k ?

k controls “brute force” memorization

Random data is harder to memorize!

At k = 14 we see neural network like behavior: memorize random data, yet generalize on real data!

8

Page 9: Learning and Memorization - EECS at UC Berkeleyalanmi/... · 1. Pure memorization can lead to generalization 2. This model replicates some interesting features of neural networks:

Learning and MemorizationSat Chatterjee

How does it compare to other methods?

Not state-of-the-art but much better than chance and closer to other methods

No search, no domain specific architecture or distance function

9

Page 10: Learning and Memorization - EECS at UC Berkeleyalanmi/... · 1. Pure memorization can lead to generalization 2. This model replicates some interesting features of neural networks:

Learning and MemorizationSat Chatterjee

45 Pairwise MNIST Tasks (e.g. separate ‘3’ from ‘7’)

Similar results: k controls brute force memorization and small k generalizes

10

Why is the variance so high?

More mixing with deeper networks

But with k = 2 there is no overfitting no matter how deep!

Page 11: Learning and Memorization - EECS at UC Berkeleyalanmi/... · 1. Pure memorization can lead to generalization 2. This model replicates some interesting features of neural networks:

Learning and MemorizationSat Chatterjee

Binary CIFAR-10

11

Page 12: Learning and Memorization - EECS at UC Berkeleyalanmi/... · 1. Pure memorization can lead to generalization 2. This model replicates some interesting features of neural networks:

Learning and MemorizationSat Chatterjee

Pairwise CIFAR-10

12

Page 13: Learning and Memorization - EECS at UC Berkeleyalanmi/... · 1. Pure memorization can lead to generalization 2. This model replicates some interesting features of neural networks:

Learning and MemorizationSat Chatterjee

Memorizing a line (inputs are 2 x 10-bit fixed point)

13

Page 14: Learning and Memorization - EECS at UC Berkeleyalanmi/... · 1. Pure memorization can lead to generalization 2. This model replicates some interesting features of neural networks:

Learning and MemorizationSat Chatterjee

Memorizing a circle (inputs are 2 x 10-bit fixed point)

14

Page 15: Learning and Memorization - EECS at UC Berkeleyalanmi/... · 1. Pure memorization can lead to generalization 2. This model replicates some interesting features of neural networks:

Learning and MemorizationSat Chatterjee

Conclusions

1. Pure memorization can lead to generalization

2. This model replicates some interesting features of neural networks:

○ Depth helps

○ It memorizes random data and yet generalizes on real data

○ Memorizing random data is harder than memorizing real data

3. Cannot use such an observation to argue that there is no memorization

15

Page 16: Learning and Memorization - EECS at UC Berkeleyalanmi/... · 1. Pure memorization can lead to generalization 2. This model replicates some interesting features of neural networks:

Learning and MemorizationSat Chatterjee

Future Directions1. Can we understand this better theoretically?

○ Prove generalization bounds (via bounding rademacher complexity?). Useful for small k

○ For larger k need new ideas to get non-vacuous bounds. Similar to the problem with neural networks but in a simpler setting (c.f. recent work on margin-based analysis)

2. Can we get a practically useful learner from this?

○ Small k networks are conservative signal hunters: if you find a signal, you know one really exists (i.e. no overfit).

○ Currently, no representation learning. Search over lut functions but with explicit control on overfitting.

○ Low hanging fruit: distillation from quantized neural nets for fast, cheap inference with no arithmetic

16

Page 17: Learning and Memorization - EECS at UC Berkeleyalanmi/... · 1. Pure memorization can lead to generalization 2. This model replicates some interesting features of neural networks:

Learning and MemorizationSat Chatterjee

Questions?

Answers?

17