Distributed Representations: Simon D. Levy Department of Computer Science Washington and Lee University Lexington, VA 24450 PHIL 395 9 May 2006 Preaching

Distributed Representations:Distributed Representations:

Simon D. Levy Simon D. Levy

Department of Computer ScienceDepartment of Computer Science

Washington and Lee UniversityWashington and Lee University

Lexington, VA 24450Lexington, VA 24450

PHIL 395PHIL 395

9 May 20069 May 2006

Preaching to the Choir in Church(land)Preaching to the Choir in Church(land)

Theme: A Neuro-Manifesto

The real motive behind eliminative materialism

is the worry that the “propositional” kinematics

and “logical” dynamics of folk psychology

constitute a radically false account of the

cognitive activity of humans, and of the higher

animals generally. The worry is that our folk

conception of how cognitive creatures represent

the world ... is a thoroughgoing

misrepresentation of what really takes

place inside us.

[It] turns out that we don t think the way we think we

think! The scientific evidence coming in all around us

is clear: Symbolic conscious reasoning, which is

extracted through protocol analysis from serial verbal

introspection, is a myth. [It] is entirely clear that the

symbolic mind that AI has tried for 50 years to

simulate is just a story we humans tell ourselves to

predict and explain the

unimaginably complex processes

occurring in our evolved brains.

• Folk psychology representations are local: “a place

for every symbol, and every symbol in its place”.

• Neural-net representations are distributed: “each

entity is represented by a pattern of activity

distributed over many computing elements, and each

computing element is involved in representing many

different entities''. (Hinton 1984)

Local Local vsvs. Distributed Representation. Distributed Representation

• Commonest distributed representation is a vector

of real numbers.

• You already know how vectors can be obtained by

back-propagation / gradient-descent.

• Today I’ll talk about some other (faster, more

plausible) ways of obtaining the vectors.

Variation IThe Hard Problem

A typical American seventh grader knows the meaning

of 10-15 words today that she didn't know yesterday ...

The typical seventh grader would have read less than

50 paragraphs since yesterday, from which she should

have should have learned less than three new words.

Apparently, she mastered the meanings of many words

that she did not encounter. - Landauer 1997

Latent Semantic AnalysisLatent Semantic Analysis“You shall know a word by the company it keeps”

– J. R. Firth

• Make a table showing how many times each word occurs

in each of a set of documents, or with another word, etc. -

purely local info

• Mathematically “smear” this information across each row

of the table, showing how likely the word would be to occur

in the other documents – distributed info

Landauer, T. K., Foltz, P. W., & Laham, D. (1998).

Introduction to Latent Semantic Analysis.

Discourse Processes, 25, 259-284.

Landauer, T. K., Foltz, P. W., & Laham,

D. (1998).

Landauer, T. K., Foltz, P. W., & Laham,

D. (1998).

Latent Semantic AnalysisLatent Semantic Analysis

• As in Elman’s SRN, reps. of similar concepts end up close

together in “meaning space”

•Amazingly useful

• Intelligent information retrieval: “Smart Googling”

(Berry et al. 1994)

• Automatic essay grading: “Who’s really looking at your SAT?”

(Landauer et al. 2000)

• Disambiguating words for automatic translation

(Davis & Levy 2006: http://www.cs.wlu.edu/translate)...

http://www.cs.wlu.edu/translate




Variation IIThe Harder Problem

The Language of Thought: Binding The Language of Thought: Binding and Recursionand Recursion

• LSA (and Elman-style hidden vectors) only give us the

representations of individual words/concepts

• Documents are just unstructured “bags of words”

• Without folk-psychological structures, how do we represent

1) the distinction between, e.g., “Lois loves Clark” and

“Clark loves Lois”?

2) intentional concepts like

“Perry knows that [Lois loves Clark]”?

Binding as Vector Product Binding as Vector Product (Smolensky 1990)(Smolensky 1990)

© 2004. Indiana University and Michael Gasser.

www.cs.indiana.edu/classes/b651/Notes/convolution.html

24 Feb 2004

• Cool, but problematic, because representations keep

getting bigger...

Holographic Reduced Holographic Reduced Representations (Plate 1991)Representations (Plate 1991)

• Binding by “circular convolution”: sum over diagonals with circularity to keep fixed size:

© 2004. Indiana University and Michael Gasser.

www.cs.indiana.edu/classes/b651/Notes/convolution.html

24 Feb 2004


• Keeping the # of dimensions constant allows us to build intentional representations of arbitrary complexity:

KNOWER*PERRY + KNOWN * (LOVER*LOIS + LOVEE*CLARK)

• As with LSA, similar propositions end up close together in “proposition space”


• Mathematically, the same operations are used to produce holograms

Variation IIIThe Hardest Problem

Language Language

• Language is a structured relationship between a set

of structured meanings and a set of structured

utterances.

• Children acquire this mapping after exposure to a

tiny fraction of the possible meaning/utterance pairs,

and [pace Elman] with very little corrective

feedback.

Asking the Right QuestionsAsking the Right Questions

• How might a language organize itself to deal with

the fact that only an infinitesimal fraction of the

possible meaning/utterance pairs will be heard by a

given speaker in their lifetime?

• How might a nervous system (synaptic weights,

topology of neurons) organize itself to match the

regularities in its environment?

Self-Organizing Maps (Kohonen Self-Organizing Maps (Kohonen

1984)1984)

• Input data consisting of N-

dimensional vectors

• Nodes (units) in a 2D grid

• Each node has a synaptic weight

vector of N dimensions

• Simple, “unsupervised” learning

algorithm...

SOM Learning AlgorithmSOM Learning Algorithm

1. Pick an input vector at random

2. “Winning” node is one whose weight vector is

closest to the input vector in vector space.

3. Update weights of winner and its grid neighbors

to move them closer to the input

Get Matlab code: http://www.cs.wlu.edu/~levy/som

http://www.cs.wlu.edu/~levy/som

http://www.cs.wlu.edu/~levy/som

SOM Learning: SOM Learning: A Two-Part A Two-Part

Invention in Two DimensionsInvention in Two Dimensions

SOM Learning: SOM Learning: A Two-Part A Two-Part

Invention in Two DimensionsInvention in Two Dimensions

SOM Learning: SOM Learning: A Three-Part A Three-Part

Invention in Three DimensionsInvention in Three Dimensions

SOM Learning: SOM Learning: A Three-Part A Three-Part

Invention in Three DimensionsInvention in Three Dimensions

Self-Organizing LanguageSelf-Organizing Language

● So grid can have any number of dimensions!

● Replace grid with high-dimensional HRR vector

● Learn to map from HRR’s for meanings to HRR’s

for utterances.

● What sort of regularities emerge?

ConclusionsConclusions

● Distributed/vector representations can encode all

sorts of information once thought to be solely the

domain of folk psychology.

● But we will need completely new organizational

principles (holograms, deformable maps, fractals,

error gradients) to be able to tackle the really hard

problems.

Thank You!