39
THE BRAIN AS A STATISTICAL INFORMATION PROCESSOR And you can too!

The Brain as a statistical Information Processor

  • Upload
    berit

  • View
    34

  • Download
    0

Embed Size (px)

DESCRIPTION

The Brain as a statistical Information Processor. And you can too!. My History and Ours. 2011. 1972. 1992. BS. S. The Brain as a Statistical IP. Introduction Evidence for Statistics Bays Law Informative Priors Joint Models Inference Conclusion. Evidence for Statistics. - PowerPoint PPT Presentation

Citation preview

Page 1: The Brain as a statistical Information Processor

THE BRAIN AS A STATISTICAL INFORMATION PROCESSOR

And you can too!

Page 2: The Brain as a statistical Information Processor

My History and Ours1972 20111992

SBS

Page 3: The Brain as a statistical Information Processor

The Brain as a Statistical IP Introduction Evidence for Statistics Bays Law Informative Priors Joint Models Inference Conclusion

Page 4: The Brain as a statistical Information Processor

Evidence for StatisticsTwo examples that seem to indicate that the brain is indeed processing statistical information

Page 5: The Brain as a statistical Information Processor

Statistics for Word Segmentation

Saffran, Aslin, Newport. “Statistical Learning in 8-Month-Old Infants”

The infants listen to strings of nonsense words with no auditory clues to word boundaries.

E.g., “bidakupa …” where “bidaku is the first word.

They learn to distinguish words from other combinations that occur (with less frequency) over word boundaries.

Page 6: The Brain as a statistical Information Processor

They Pay More Attention toNon-Words

Speaker

LightChild

Page 7: The Brain as a statistical Information Processor

Statistics in Low-level Vision

Based on Rosenholtz et. al. (2011)

A B

Page 8: The Brain as a statistical Information Processor

Statistics in Low-level Vision

Based on Rosenholtz et. al. (2011)

A N O B E L

Page 9: The Brain as a statistical Information Processor

Are summary statistics a good choice of representation?

A much better idea than spatial subsampling

Original patch ~1000 pixels

Page 10: The Brain as a statistical Information Processor

Original patch

Are summary statistics a good choice of representation?

A rich set of statistics can capture a lot of useful information

Patch synthesized to match~1000 statistical parameters(Portilla & Simoncelli, 2000)

Page 11: The Brain as a statistical Information Processor

Discrimination based on P&S stats predicts crowded letter recognition Balas, Nakano, & Rosenholtz, JoV, 2009

Page 12: The Brain as a statistical Information Processor

Bayes Law and Cognitive Science

To my mind, at least, it packs a lot of information

Page 13: The Brain as a statistical Information Processor

Bayes Law and Cognitive Science

P(M|E) = P(M) P(E|M)P(E)

M = Learned Model of the worldE = Learner’s environment (sensory input)

Page 14: The Brain as a statistical Information Processor

Bayes Law P(M|E) =P(M) P(E|M)

P(E)

It divides up responsibility correctly.It requires a generative model. (big, joint)It (obliquely) suggests that as far as

learning goes we ignore the programs that use the model.

But which M?

Page 15: The Brain as a statistical Information Processor

Bayes Law Does not Pick M Don’t pick M. Integrate over all of them.

Pick the M that maximizes P(M)P(E|M).

Pick the average P(M) (Gibbs sampling).

P(E) = Σ P(M)P(E|M)M

Page 16: The Brain as a statistical Information Processor

My Personal OpinionDon’t sweat it.

Page 17: The Brain as a statistical Information Processor

Informative PriorsThree examples where they are critical

Page 18: The Brain as a statistical Information Processor

Parsing Visual Scenes(Sudderth, Jordan)

trees

skyscraper sky

bell

dome

templebuildings

sky

Page 19: The Brain as a statistical Information Processor

Spatially Dependent Pitman-Yor

• Cut random surfaces (samples from a GP) with thresholds(as in Level Set Methods)• Assign each pixel to the first surface which exceeds threshold(as in Layered Models)

Duan, Guindani, & Gelfand, Generalized Spatial DP, 2007

Page 20: The Brain as a statistical Information Processor

Samples from Spatial Prior

Comparison: Potts Markov Random Field

Page 21: The Brain as a statistical Information Processor

Prior for Word Segmentation Based on the work of Goldwater et. al. Separate one “word” from the next in

child-directed speech.

E.g., yuwanttusiD6bUk You want to see the book

Page 22: The Brain as a statistical Information Processor

Bag of Words Generative Story For each utterance: For each word w (or STOP) pick with probability P(w) If w=STOP break

If we pick M to maximize P(E|M) the model memorizes the data. I.e., It creates one “word” which is the concatenation of all the words in that sentence.

Page 23: The Brain as a statistical Information Processor

Results Using a Dirichlet Prior

Precision: 61.6 Recall: 47.6

Example: youwant to see thebook

Page 24: The Brain as a statistical Information Processor

Part-of-speech Induction Primarily based on Clark (2003)

Given a sequence of words, deduce their parts of speech (e.g., DT, NN, etc.)

Generative story: For each word position (i) in the text 1) propose part-of-speech (t) p(t|t-1) 2) propose a word (w) using p(w|t)

Page 25: The Brain as a statistical Information Processor

Sparse Tag Distributions We could put a Dirichlet prior on P(w|t) But what we really want is sparse P(t|w) Almost all words (by type) have only one

part-of-speech We do best by only allowing this. E.g., “can” is only a model verb (we

hope!) Putting a sparse prior on P(word-type|t)

also helps.

Page 26: The Brain as a statistical Information Processor

Joint Generative Modeling

Two examples that show the strengths of modeling many phenomena jointly.

Page 27: The Brain as a statistical Information Processor

Joint POS Tagging and Morphology

Clark pos tagger also includes something sort of like a morphology model.

It assumes POS tags are correlated with spelling.

True morphology would recognize that “ride” “riding” and “rides” share a root.

I do not know of any true joint tagging-morphology model.

Page 28: The Brain as a statistical Information Processor

Joint Reference and (Named) Entity Recognition

Based on Haghighi & Klein 2010

Weiner said the problems were all Facebook’s fault. They should never have given him an account.

(person) Type1 (organization) Type2

Obama, Weiner, fatherIBM, Facebook, company

Page 29: The Brain as a statistical Information Processor

InferenceOtherwise know as hardware.

Page 30: The Brain as a statistical Information Processor

It is not EM More generally it is not any mechanism

that requires tracking all expectations.

Consider the word boundary. Between every two phonemes there may or may not be a boundary.

abcde a|bcde ab|cde abc|de abcd|e a|b|cde …

Page 31: The Brain as a statistical Information Processor

Gibbs Sampling Start out with random guesses. Do (roughly) forever: Pick a random point. Compute p(split) and p(join). Pick r, 0<r<1:

if p(split) > r split,p(split)+p(join)

else join.

Page 32: The Brain as a statistical Information Processor

Gibbs has Very Nice Properties

Page 33: The Brain as a statistical Information Processor

Gibbs has Very Nice Properties

Page 34: The Brain as a statistical Information Processor

It is not Gibbs Either First, the nice properties only hold for

“exchangeable” distributions. It seems likely that most of the ones we care about are not (e.g., Haghighi & Klein)

But critically it assumes we have all the training data at once and go over it many times.

Page 35: The Brain as a statistical Information Processor

It is Particle Filterning Or something like it. At the level of detail here, just think

“beam search.”

Page 36: The Brain as a statistical Information Processor

Parsing and CKY

NPNNSDogs

VBSlike

NPNNSbones

VP

SInformation Barrier

Page 37: The Brain as a statistical Information Processor

It is Particle Filterning Or something like it. At the level of detail here, just think

“beam search.”(ROOT

(Root (S (NP (NNS Dogs)

(ROOT (NP (NNS Dogs)

(ROOT (S (NP (NNS Dogs)) (VP (VBS eat)

Page 38: The Brain as a statistical Information Processor

Conclusion The brain operates by manipulating

probabilities. World-model induction is governed by

Bayes Law This implies we have a large joint

generative model It seems overwhelmingly likely that we

have a very informative prior. Something like particle filtering is the

inference/use mechanism.

Page 39: The Brain as a statistical Information Processor

THANK YOU