31
Distributed Distributed Representations: Representations: Simon D. Levy Simon D. Levy Department of Computer Department of Computer Science Science Washington and Lee Washington and Lee University University Lexington, VA 24450 Lexington, VA 24450 PHIL 395 PHIL 395 9 May 2006 aching to the Choir in Church(land aching to the Choir in Church(land)

Distributed Representations: Simon D. Levy Department of Computer Science Washington and Lee University Lexington, VA 24450 PHIL 395 9 May 2006 Preaching

Embed Size (px)

Citation preview

Page 1: Distributed Representations: Simon D. Levy Department of Computer Science Washington and Lee University Lexington, VA 24450 PHIL 395 9 May 2006 Preaching

Distributed Representations:Distributed Representations:

Simon D. Levy Simon D. Levy

Department of Computer ScienceDepartment of Computer Science

Washington and Lee UniversityWashington and Lee University

Lexington, VA 24450Lexington, VA 24450

PHIL 395PHIL 395

9 May 20069 May 2006

Preaching to the Choir in Church(land)Preaching to the Choir in Church(land)

Page 2: Distributed Representations: Simon D. Levy Department of Computer Science Washington and Lee University Lexington, VA 24450 PHIL 395 9 May 2006 Preaching

Theme: A Neuro-Manifesto

Page 3: Distributed Representations: Simon D. Levy Department of Computer Science Washington and Lee University Lexington, VA 24450 PHIL 395 9 May 2006 Preaching

The real motive behind eliminative materialism

is the worry that the “propositional” kinematics

and “logical” dynamics of folk psychology

constitute a radically false account of the

cognitive activity of humans, and of the higher

animals generally. The worry is that our folk

conception of how cognitive creatures represent

the world ... is a thoroughgoing

misrepresentation of what really takes

place inside us.

Page 4: Distributed Representations: Simon D. Levy Department of Computer Science Washington and Lee University Lexington, VA 24450 PHIL 395 9 May 2006 Preaching

[It] turns out that we don t think the way we think we

think! The scientific evidence coming in all around us

is clear: Symbolic conscious reasoning, which is

extracted through protocol analysis from serial verbal

introspection, is a myth. [It] is entirely clear that the

symbolic mind that AI has tried for 50 years to

simulate is just a story we humans tell ourselves to

predict and explain the

unimaginably complex processes

occurring in our evolved brains.

Page 5: Distributed Representations: Simon D. Levy Department of Computer Science Washington and Lee University Lexington, VA 24450 PHIL 395 9 May 2006 Preaching

• Folk psychology representations are local: “a place

for every symbol, and every symbol in its place”.

• Neural-net representations are distributed: “each

entity is represented by a pattern of activity

distributed over many computing elements, and each

computing element is involved in representing many

different entities''. (Hinton 1984)

Local Local vsvs. Distributed Representation. Distributed Representation

Page 6: Distributed Representations: Simon D. Levy Department of Computer Science Washington and Lee University Lexington, VA 24450 PHIL 395 9 May 2006 Preaching

• Commonest distributed representation is a vector

of real numbers.

• You already know how vectors can be obtained by

back-propagation / gradient-descent.

• Today I’ll talk about some other (faster, more

plausible) ways of obtaining the vectors.

Page 7: Distributed Representations: Simon D. Levy Department of Computer Science Washington and Lee University Lexington, VA 24450 PHIL 395 9 May 2006 Preaching

Variation IThe Hard Problem

Page 8: Distributed Representations: Simon D. Levy Department of Computer Science Washington and Lee University Lexington, VA 24450 PHIL 395 9 May 2006 Preaching

A typical American seventh grader knows the meaning

of 10-15 words today that she didn't know yesterday ...

The typical seventh grader would have read less than

50 paragraphs since yesterday, from which she should

have should have learned less than three new words.

Apparently, she mastered the meanings of many words

that she did not encounter. - Landauer 1997

Page 9: Distributed Representations: Simon D. Levy Department of Computer Science Washington and Lee University Lexington, VA 24450 PHIL 395 9 May 2006 Preaching

Latent Semantic AnalysisLatent Semantic Analysis“You shall know a word by the company it keeps”

– J. R. Firth

• Make a table showing how many times each word occurs

in each of a set of documents, or with another word, etc. -

purely local info

• Mathematically “smear” this information across each row

of the table, showing how likely the word would be to occur

in the other documents – distributed info

Page 10: Distributed Representations: Simon D. Levy Department of Computer Science Washington and Lee University Lexington, VA 24450 PHIL 395 9 May 2006 Preaching

Landauer, T. K., Foltz, P. W., & Laham, D. (1998).

Introduction to Latent Semantic Analysis.

Discourse Processes, 25, 259-284.

Page 11: Distributed Representations: Simon D. Levy Department of Computer Science Washington and Lee University Lexington, VA 24450 PHIL 395 9 May 2006 Preaching

Landauer, T. K., Foltz, P. W., & Laham,

D. (1998).

Page 12: Distributed Representations: Simon D. Levy Department of Computer Science Washington and Lee University Lexington, VA 24450 PHIL 395 9 May 2006 Preaching

Landauer, T. K., Foltz, P. W., & Laham,

D. (1998).

Page 13: Distributed Representations: Simon D. Levy Department of Computer Science Washington and Lee University Lexington, VA 24450 PHIL 395 9 May 2006 Preaching

Latent Semantic AnalysisLatent Semantic Analysis

• As in Elman’s SRN, reps. of similar concepts end up close

together in “meaning space”

•Amazingly useful

• Intelligent information retrieval: “Smart Googling”

(Berry et al. 1994)

• Automatic essay grading: “Who’s really looking at your SAT?”

(Landauer et al. 2000)

• Disambiguating words for automatic translation

(Davis & Levy 2006: http://www.cs.wlu.edu/translate)...

Page 14: Distributed Representations: Simon D. Levy Department of Computer Science Washington and Lee University Lexington, VA 24450 PHIL 395 9 May 2006 Preaching

Variation IIThe Harder Problem

Page 15: Distributed Representations: Simon D. Levy Department of Computer Science Washington and Lee University Lexington, VA 24450 PHIL 395 9 May 2006 Preaching

The Language of Thought: Binding The Language of Thought: Binding and Recursionand Recursion

• LSA (and Elman-style hidden vectors) only give us the

representations of individual words/concepts

• Documents are just unstructured “bags of words”

• Without folk-psychological structures, how do we represent

1) the distinction between, e.g., “Lois loves Clark” and

“Clark loves Lois”?

2) intentional concepts like

“Perry knows that [Lois loves Clark]”?

Page 16: Distributed Representations: Simon D. Levy Department of Computer Science Washington and Lee University Lexington, VA 24450 PHIL 395 9 May 2006 Preaching

Binding as Vector Product Binding as Vector Product (Smolensky 1990)(Smolensky 1990)

© 2004. Indiana University and Michael Gasser.

www.cs.indiana.edu/classes/b651/Notes/convolution.html

24 Feb 2004

• Cool, but problematic, because representations keep

getting bigger...

Page 17: Distributed Representations: Simon D. Levy Department of Computer Science Washington and Lee University Lexington, VA 24450 PHIL 395 9 May 2006 Preaching

Holographic Reduced Holographic Reduced Representations (Plate 1991)Representations (Plate 1991)

• Binding by “circular convolution”: sum over diagonals with circularity to keep fixed size:

© 2004. Indiana University and Michael Gasser.

www.cs.indiana.edu/classes/b651/Notes/convolution.html

24 Feb 2004

Page 18: Distributed Representations: Simon D. Levy Department of Computer Science Washington and Lee University Lexington, VA 24450 PHIL 395 9 May 2006 Preaching

Holographic Reduced Holographic Reduced Representations (Plate 1991)Representations (Plate 1991)

• Keeping the # of dimensions constant allows us to build intentional representations of arbitrary complexity:

KNOWER*PERRY + KNOWN * (LOVER*LOIS + LOVEE*CLARK)

• As with LSA, similar propositions end up close together in “proposition space”

Page 19: Distributed Representations: Simon D. Levy Department of Computer Science Washington and Lee University Lexington, VA 24450 PHIL 395 9 May 2006 Preaching

Holographic Reduced Holographic Reduced Representations (Plate 1991)Representations (Plate 1991)

• Mathematically, the same operations are used to produce holograms

Page 20: Distributed Representations: Simon D. Levy Department of Computer Science Washington and Lee University Lexington, VA 24450 PHIL 395 9 May 2006 Preaching

Variation IIIThe Hardest Problem

Page 21: Distributed Representations: Simon D. Levy Department of Computer Science Washington and Lee University Lexington, VA 24450 PHIL 395 9 May 2006 Preaching

Language Language

• Language is a structured relationship between a set

of structured meanings and a set of structured

utterances.

• Children acquire this mapping after exposure to a

tiny fraction of the possible meaning/utterance pairs,

and [pace Elman] with very little corrective

feedback.

Page 22: Distributed Representations: Simon D. Levy Department of Computer Science Washington and Lee University Lexington, VA 24450 PHIL 395 9 May 2006 Preaching

Asking the Right QuestionsAsking the Right Questions

• How might a language organize itself to deal with

the fact that only an infinitesimal fraction of the

possible meaning/utterance pairs will be heard by a

given speaker in their lifetime?

• How might a nervous system (synaptic weights,

topology of neurons) organize itself to match the

regularities in its environment?

Page 23: Distributed Representations: Simon D. Levy Department of Computer Science Washington and Lee University Lexington, VA 24450 PHIL 395 9 May 2006 Preaching

Self-Organizing Maps (Kohonen Self-Organizing Maps (Kohonen

1984)1984)

• Input data consisting of N-

dimensional vectors

• Nodes (units) in a 2D grid

• Each node has a synaptic weight

vector of N dimensions

• Simple, “unsupervised” learning

algorithm...

Page 24: Distributed Representations: Simon D. Levy Department of Computer Science Washington and Lee University Lexington, VA 24450 PHIL 395 9 May 2006 Preaching

SOM Learning AlgorithmSOM Learning Algorithm

1. Pick an input vector at random

2. “Winning” node is one whose weight vector is

closest to the input vector in vector space.

3. Update weights of winner and its grid neighbors

to move them closer to the input

Get Matlab code: http://www.cs.wlu.edu/~levy/som

Page 25: Distributed Representations: Simon D. Levy Department of Computer Science Washington and Lee University Lexington, VA 24450 PHIL 395 9 May 2006 Preaching

SOM Learning: SOM Learning: A Two-Part A Two-Part

Invention in Two DimensionsInvention in Two Dimensions

Page 26: Distributed Representations: Simon D. Levy Department of Computer Science Washington and Lee University Lexington, VA 24450 PHIL 395 9 May 2006 Preaching

SOM Learning: SOM Learning: A Two-Part A Two-Part

Invention in Two DimensionsInvention in Two Dimensions

Page 27: Distributed Representations: Simon D. Levy Department of Computer Science Washington and Lee University Lexington, VA 24450 PHIL 395 9 May 2006 Preaching

SOM Learning: SOM Learning: A Three-Part A Three-Part

Invention in Three DimensionsInvention in Three Dimensions

Page 28: Distributed Representations: Simon D. Levy Department of Computer Science Washington and Lee University Lexington, VA 24450 PHIL 395 9 May 2006 Preaching

SOM Learning: SOM Learning: A Three-Part A Three-Part

Invention in Three DimensionsInvention in Three Dimensions

Page 29: Distributed Representations: Simon D. Levy Department of Computer Science Washington and Lee University Lexington, VA 24450 PHIL 395 9 May 2006 Preaching

Self-Organizing LanguageSelf-Organizing Language

● So grid can have any number of dimensions!

● Replace grid with high-dimensional HRR vector

● Learn to map from HRR’s for meanings to HRR’s

for utterances.

● What sort of regularities emerge?

Page 30: Distributed Representations: Simon D. Levy Department of Computer Science Washington and Lee University Lexington, VA 24450 PHIL 395 9 May 2006 Preaching

ConclusionsConclusions

● Distributed/vector representations can encode all

sorts of information once thought to be solely the

domain of folk psychology.

● But we will need completely new organizational

principles (holograms, deformable maps, fractals,

error gradients) to be able to tackle the really hard

problems.

Page 31: Distributed Representations: Simon D. Levy Department of Computer Science Washington and Lee University Lexington, VA 24450 PHIL 395 9 May 2006 Preaching

Thank You!