The Harmonic Mind Paul Smolensky Cognitive Science Department Johns Hopkins University A Mystery ‘Co’-laborator Géraldine Legendre Alan Prince Peter Jusczyk

The Harmonic Mind

Paul SmolenskyCognitive Science Department

Johns Hopkins University

A Mystery ‘Co’-laborator

Géraldine LegendreAlan Prince

Peter Jusczyk Donald Mathis

Melanie Soderstrom

with:

Personal Firsts thanks to SPP

First invited talk! (& first visit to JHU, 1986) First public confessional: midnight thoughts

of a worried connectionist (UNC, 1988) First generative syntax talk (Memphis, 1994) First attempt at stand-up comedy (Columbia,

2000) First rendition of a 900-page book as a

graphical synopsis in Powerpoint (1 minute from now)

Advertisement

Blackwell 2002 (??) Develop the Integrated Connectionist/Symbolic

(ICS) Cognitive Architecture Case study in formalist multidisciplinary

cognitive science

The Harmonic Mind:

From neural computation to optimality-theoretic grammar

Paul Smolensky & Géraldine Legendre

Talk Plan

‘Sketch’ the ICS cognitive architecture, pointing to contributions from/to traditional disciplines

Topics of direct philosophical relevance• Explanation of the productivity of cognition• Nativism

Theoretical work– Symbolic– Connectionist

Experimental work

Mystery Quote #1

“Smolensky has recently been spending a lot of his time trying to show that, vivid first impressions to the contrary notwithstanding, some sort of connectionist cognitive architecture can indeed account for compositionality, productivity, systematicity, and the like. It turns out to be rather a long story … 185 pages … are devoted to Smolensky’s telling of it, and there appears to be no end in sight. It seems it takes a lot of squeezing to get this stone to bleed.”

Computational neuroscience ICS Key sources

• Hopfield 1982, 1984• Cohen and Grossberg 1983• Hinton and Sejnowski 1983, 1986• Smolensky 1983, 1986• Geman and Geman 1984 • Golden 1986, 1988

11 1 2

daa i a

dt a1

i1(0.6

)

a2

i2(0.5

)

–λ(–0.9)

1 1 2 2 1 22 2

1 2

( )

)

H a i a i a a

a a

a

½(

Competitive Net

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0 0.2 0.4 0.6 0.8 1

a1 (i 1 = 0.6)

a2

(i2

= 0

.5)

Competitive Net

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1

Time

a1

a2

Processing I: Activation

Competitive Net

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1

Time

a1

a2

H

-0.5

-0.2

0.1

0.4

0.7 1

-0.500.51

-1.2

-1-0.8

-0.6

-0.4

-0.2

0

0.2

Harmony

a2

a1

Competitive Net

Processing — spreading activation — is

optimization: Harmony maximization

–λ(–0.9)

a1

i1(0.6

)

a2

i2(0.5

)

Processing II: Optimization

-0.5

-0.2

0.1

0.4

0.7 1

-0.500.51

-1.2

-1-0.8

-0.6

-0.4

-0.2

0

0.2

Harmony

a2

a1

Competitive Net

a1 must be active (strength: 0.6)

0.79

–0.21Optimal

compromise:

Key sources:• Hinton & Anderson

1981• Rumelhart,

McClelland, & the PDP Group 1986

Cognitive psychology ICS

a2 must be active (strength: 0.5)

Harmony maximization is satisfaction of parallel,

violable constraints

a1 and a2 must not be

simultaneously active (strength:

λ)

Representation

Symbolic theory ICS• Complex symbol structures• Generative linguistics ICS

Particular linguistic representations

PDP connectionism ICS• Distributed activation patterns

ICS: • realization of (higher-level) complex

symbolic structures in distributed patterns of activation over (lower-level) units (‘tensor product representations’ etc.)

Representation

Activation patterns: cat and its constituents

-1 4 9 14

Unit (Area = activation level)

k/r0

æ/r01

t/r11

σ/rε

[σ k [æ t]]

σ

ktæ

Linguistics (markedness theory) ICS ICS Generative linguistics:

Optimality Theory Key sources:

• Prince & Smolensky 1993 [ms.; Rutgers report]

• McCarthy & Prince 1993 [ms.]• Texts: Archangeli & Langendoen 1997,

Kager 1999, McCarthy 2001• Electronic archive: rutgers/ruccs/roa.html

Constraints

Met in SPP Debate, 1988!

Constraints

NOCODA: A syllable has no codaσ

ktæ

* violation

W

* H(a[σ k [æ t]) =

–sNOCODA < 0

a[σ k [æ t ]] *

* violation

Constraint Interaction I

ICS Grammatical theory• Harmonic Grammar

Legendre, Miyata, Smolensky 1990 et seq.

= H=

a aW

Constraint Interaction I

σ

ktæ

H

=( , )i ij jc cH

H(k/, σ) H(σ,\

t)ONSET

Onset/k

The grammar generates the representation that maximizes H: this best-satisfies the constraints, given their differential strengths

NOCODA

Coda/t

Any formal language can be so generated.

Harmonic Grammar Parser

Simple, comprehensible network Simple grammar G

• X → A B Y → B A Language

•

Parsing

A B B A

X Y

A B B A

X Y

Top-down

A B B A Bottom-up

X Y


Representations:

⊗

Filler vectors: A, B, X,

Y

Role vectors: rε = 1 r0 = (1 1) r1

= (1 –1)

①②

③④

⑤⑥

⑦⑧

⑨⑩

⑪⑫

i, j, k ∊ {A, B, X, Y}

j k

i

Depth 0 Depth 1


Representations:

0 1 2 3 4 5 6 7 8 9 10 11 12 13

Units

[Y B A]

[X A B]

[B A]

[A B]

B —

— A

B —

A —

B

A

Y

X

W (Y — A)

0

1

2

3

4

5

0 1 2 3 4 5 6 7 8 9

Depth-1 Units

Dep

th-0

Un

its

W (Y B —)

0

1

2

3

4

5

0 1 2 3 4 5 6 7 8 9

Depth-1 Units

Dep

th-0

Un

its

W (Y B A)

0

1

2

3

4

5

0 1 2 3 4 5 6 7 8 9

Depth-1 Units

Dep

th-0

Un

its

Harmonic Grammar Parser Weight matrix for Y → B A

H(Y, B—) > 0H(Y, —A) > 0

Harmonic Grammar Parser Weight matrix for X → A B

W (X A —)

0

1

2

3

4

5

0 1 2 3 4 5 6 7 8 9

Depth-1 Units

De

pth

-0 U

nit

s

W (X — B)

0

1

2

3

4

5

0 1 2 3 4 5 6 7 8 9

Depth-1 Units

Dep

th-0

Uni

ts

W (X A B)

0

1

2

3

4

5

0 1 2 3 4 5 6 7 8 9

Depth-1 Units

Dep

th-0

Un

its

W (X A B)

0

1

2

3

4

5

0 1 2 3 4 5 6 7 8 9

Depth-1 Units

Dep

th-0

Un

its

W (Y B A)

0

1

2

3

4

5

0 1 2 3 4 5 6 7 8 9

Depth-1 Units

De

pth

-0 U

nit

s

Harmonic Grammar Parser Weight matrix for entire grammar

G W (X A B, Y B A)

0

1

2

3

4

5

0 1 2 3 4 5 6 7 8 9

Depth-1 Units

Dep

th-0

Un

its

Bottom-up Parsing

A B → X = (1 1 1 1)/2

-1

0

1

2

3

4

5

6

0 1 2 3 4 5

Depth-0 Units

Tim

e

A B → X

0

1

2

Time

a H

Top-down ParsingX → A B = (1 0 -1 0 0 1 0 -1)/4

-1

0

1

2

3

4

5

6

0 1 2 3 4 5 6 7 8 9

Depth-1 Units

Tim

e

X A B

0

0.25

0.5

Time

a H

Explaining Productivity

Full-scale parsing of formal languages by neural-network Harmony maximization: productive competence

How to explain?

1. Structured representations

Representations

0 1 2 3 4 5 6 7 8 9 10 11 12 13

Uni ts

YBA

XAB

BA

AB

B —

— A

B —

A —

B

A

Y

X

+ 2. Structured connectionsW (X A B, Y B A)

0

1

2

3

4

5

0 1 2 3 4 5 6 7 8 9

Depth-1 Uni ts

W (X — B)

0

1

2

3

4

5

0 1 2 3 4 5 6 7 8 9

Depth-1 Units

W (Y B —)

0

1

2

3

4

5

0 1 2 3 4 5 6 7 8 9

Depth-1 Units

W (X A —)

0

1

2

3

4

5

0 1 2 3 4 5 6 7 8 9

Depth-1 Units

W (Y — A)

0

1

2

3

4

5

0 1 2 3 4 5 6 7 8 9

Depth-1 Units

W (X A B)

0

1

2

3

4

5

0 1 2 3 4 5 6 7 8 9

Depth-1 Units

W (Y B A)

0

1

2

3

4

5

0 1 2 3 4 5 6 7 8 9

Depth-1 Uni ts

= Proof of Productivity

Productive behavior follows mathematically from combining • the combinatorial structure of the

vectorial representations encoding inputs & outputs

and • the combinatorial structure of the

weight matrices encoding knowledge

Mystery Quote #2

“Paul Smolensky has recently announced that the problem of explaining the compositionality of concepts within a connectionist framework is solved in principle. … This sounds suspiciously like the offer of a free lunch, and it turns out, upon examination, that there is nothing to it.”

Explaining Productivity I

Representations

0 1 2 3 4 5 6 7 8 9 10 11 12 13

Units

[X A B]

[A B]

— B

A —

X

+

+ Intra-level decompositio

n: [A B] {A, B}

Inter-level decomposition: [A B] {1,0,1,…

1}

Semantics

Processes

Processes

GOFAI

ICS

ICS & GOFAI

Explaining Productivity II

Intra-level decomposition: G {XAB, YBA}

Inter-level decomposition: [A B] {1,0,1,…

1}

Semantics

Processes

Processes

GOFAI

ICS

ICS & GOFAIW (X A B, Y B A)

0

1

2

3

4

5

0 1 2 3 4 5 6 7 8 9

Depth-1 Units

Dep

th-0

Un

its

+

W (Y B A)

0

1

2

3

4

5

0 1 2 3 4 5 6 7 8 9

Depth-1 Units

De

pth

-0 U

nit

s

W (Y B A)

0

1

2

3

4

5

0 1 2 3 4 5 6 7 8 9

Depth-1 Units

De

pth

-0 U

nit

s

Mystery Quote #3

“ … even after all those pages, Smolensky hasn’t so much as made a start on constructing an alternative to the Classical account of the compositionality phenomena.”

Constraint Interaction II: OT

ICS Grammatical theory• Optimality Theory

Prince & Smolensky 1993


Differential strength encoded in strict domination hierarchies:• Every constraint has complete priority

over all lower-ranked constraints (combined)

• = ‘Take-the-best’ heuristic (Hertwig, today) constraint cue ranking cue validity

• Decision-theoretic justification for OT?• Approximate numerical encoding employs

special (exponentially growing) weights


Candidates STRESSHEAVY MAINSTRESSRIGHT Harmony

a. σHσ…σσ

n

**…*

n

n(wMAINSTRESSRIGHT)

b. σHσ…σσ n

* wSTRESSHEAVY

“Grammars can’t count”

Stress is on the initial heavy syllable iff the number of light syllables n obeys TRESS EAVY

AIN TRESS IGHT

S H

M S R

wn

w No way, man


Constraints are universal Human grammars differ only in

how these constraints are ranked• ‘factorial typology’

First true contender for a formal theory of cross-linguistic typology

The Faithfulness / Markedness Dialectic

‘cat’: /kat/ kæt *NOCODA — why?• FAITHFULNESS requires identity• MARKEDNESS often opposes it

Markedness-Faithfulness dialectic diversity• English: NOCODA ≫ FAITH • Polynesian: FAITH ≫ NOCODA (~French)

Another markedness constraint M: • Nasal Place Agreement [‘Assimilation’] (NPA):

mb ≻ nb, ŋb nd ≻ md, ŋd ŋg ≻ ŋb, ŋd labial coronal velar

Nativism I: Learnability

Learning algorithm • Provably correct and efficient (under

strong assumptions)

• Sources:Tesar 1995 et seq. Tesar & Smolensky 1993, …, 2000

• If you hear A when you expected to hear E, minimally demote each constraint violated by A below a constraint violated by E

in +possible

Candidates

FaithMark (NPA)

☹ ☞ Einpossibl

e *

A impossibl

e *

Faith

*☺ ☞

• If you hear A when you expected to hear E, minimally demote each constraint violated by A below a constraint violated by E

Constraint Demotion Learning

Correctly handles difficult case: multiple violations in E

Nativism I: Learnability

M ≫ F is learnable with /in+possible/→impossible• ‘not’ = in- except when followed by …• “exception that proves the rule”: M = NPA

M ≫ F is not learnable from data if there are no ‘exceptions’ (alternations) of this sort, e.g., if no affixes and all underlying morphemes have mp: √M and √F, no M vs. F conflict, no evidence for their ranking

Thus must have M ≫ F in the initial state, ℌ0

Nativism II: Experimental Test

Linking hypothesis: More harmonic phonological stimuli

⇒ Longer listening time More harmonic:

• √M ≻ *M, when equal on F• √F ≻ *F, when equal on M• When must chose one or the other,

more harmonic to satisfy M: M ≫ F M = Nasal Place Assimilation (NPA)

Collaborators Peter Jusczyk Theresa Allocco (Elliott Moreton, Karen Arnold)

15.36

12.31

0

2

4

6

8

10

12

14

16

18

20

Faithfulness Markedness M ≫ F

Tim

e (s

ec)

Higher HLower H

4.5 Months (NPA)

Higher Harmony

Lower Harmony

um…ber…umber

um…ber… iŋgu

p = .006 (11/16)

15.2315.36

12.7312.31

0

2

4

6

8

10

12

14

16

18

20


Tim

e (s

ec)

Higher HLower H Higher

HarmonyLower Harmony

um…ber…umber

un…ber…unber

p = .044 (11/16)

4.5 Months (NPA)

15.2315.36

12.7312.31

0

2

4

6

8

10

12

14

16

18

20


Tim

e (s

ec)

Higher HLower H

4.5 Months (NPA) Markedness Faithfulness

un…ber…umber

un…ber…unber

???

16.75

15.2315.3614.01

12.7312.31

0

2

4

6

8

10

12

14

16

18

20


Tim

e (s

ec)

Higher HLower H

4.5 Months (NPA)Higher

HarmonyLower Harmony

un…ber…umber

un…ber…unber

p = .001 (12/16)

Nativism III: UGenome

Can we combine• Connectionist realization of harmonic

grammar• OT’s characterization of UG

to examine the biological plausibility of UG as innate knowledge?

Collaborators• Melanie Soderstrom• Donald Mathis

Nativism III: UGenome

The game: take a first shot at a concrete example of a genetic encoding of UG in a Language Acquisition Device

Introduce an ‘abstract genome’ notion parallel to (and encoding) ‘abstract neural network’

Is connectionist empiricism clearly more biologically plausible than symbolic nativism? No!

The Problem

No concrete examples of such a LAD exist

Even highly simplified cases pose a hard problem:

How can genes — which regulate production of proteins — encode symbolic principles of grammar?

Test preparation: Syllable Theory

Basic syllabification: Function

ƒ: /underlying form/ [surface form] Plural form of dish:

• /dš+s/ [.d.š z.] /CVCC/ [.CV.C V C.]



• /dš+s/ [.d.š z.] /CVCC/ [.CV.C V C.] Basic CV Syllable Structure Theory

• Prince & Smolensky 1993: Chapter 6• ‘Basic’ — No more than one segment

per syllable position: .(C)V(C).



• /dš+s/ [.d.š z.] /CVCC/ [.CV.C V C.] Basic CV Syllable Structure Theory Correspondence Theory

• McCarthy & Prince 1995 (‘M&P’) /C1V2C3C4/ [.C1V2.C3 V C4]

PARSE: Every element in the input corresponds to an element in the output — “no deletion” [M&P: ‘MAX’]

Syllabification: Constraints (Con)

PARSE: Every element in the input corresponds to an element in the output

FILLV/C: Every output V/C segment corresponds to an input V/C segment [every syllable position in the output is filled by an input segment] — “no insertion/epenthesis” [M&P: ‘DEP’]



FILLV/C: Every output V/C segment corresponds to an input V/C segment

ONSET: No V without a preceding C



FILLV/C: Every output V/C segment corresponds to an input V/C segment

ONSET: No V without a preceding C

NOCODA: No C without a following V


SAnet architecture

/C1 C2/ [C1 V C2]

CV

/C1 C2 /

[

C1

V

C2

]

Connection substructure

Local: fixed, gene-tically determinedContent of constraint 1

Global: variable during learningStrength of constraint 1

1

s1

1c

2

is2

2c

W iic Network weight:

Network input: ι = WΨ a

PARSE

C

V

3 3

3

3

33

1

11

1

1

1

3 3

3

3

33

3 3

3

3

33

All connection coefficients are +2

ONSET All connection coefficients are 1

C

V

Activation dynamics

W a

φ

logφ φι /

1(a 1) ƒ (ι / )

1 Tpr Te

φψ ΦΨ

1

WconN

ii

i

sc

Boltzmann Machine/Harmony Theory dynamics (temperature T 0)

( )/( ) H Tp e aa

Boltzmann-type learning dynamics

Clamped: P += input & output; P = input

Δsi = ε[E{Hi |P +} E{Hi |P

}]

εE{Hi|P } =

During the processing of training data in phase P , whenever unit φ (of type Φ) and unit ψ (of type Ψ) are simultaneously active, modify si by ε . [ε = ε/Np ]

( ) ( )ΦΨ φ ψφψ

ε a api ppc

( | )( ) ( | )ln

( | )I Op O I

p I p O Ip O I

õ

Gradient descent in

ic

Crucial Open Question(Truth in Advertising)

Relation between strict domination and neural networks?

Apparently not a problem in the case of the CV Theory

To be encoded How many different kinds of units are

there? What information is necessary (from

the source unit’s point of view) to identify the location of a target unit, and the strength of the connection with it?

How are constraints initially specified? How are they maintained through the

learning process?

Unit types

Input units C V Output units C V x Correspondence units C V 7 distinct unit types Each represented in a distinct sub-

region of the abstract genome ‘Help ourselves’ to implicit

machinery to spell out these sub-regions as distinct cell types, located in grid as illustrated

Connectivity geometry Assume 3-d grid geometry

V

C

‘E’

‘N’

‘back’

Constraint: PARSE

CV

3 33

3

33

111

11

1

3 33

3

33

3 33

3

33

Input units grow south and connect Output units grow east and connect Correspondence units grow north & west

and connect with input & output units.

Constraint: ONSET Short connections grow north-south

between adjacent V output units, and between the first V node and the

first x node.

C

V

Direction of projection growth

Topographic organizations widely attested throughout neural structures• Activity-dependent growth a possible

alternative Orientation information (axes)

• Chemical gradients during development• Cell age a possible alternative

Projection parameters

Direction Extent

• Local• Non-local

Target unit type Strength of connections encoded

separately

Connectivity Genome

Contributions from ONSET and PARSE:

Source:

CI VI CO VO CC VC xo

Projec-tions:

S LCC S L VC E L CC E L VC

N&S S VO

N S x0

N L CI

W L CO

N L VI

W L VO

S S VO

Key: Direction Extent Target

N(orth) S(outh)E(ast) W(est)F(ront) B(ack)

L(ong) S(hort)

Input: CI VI

Output: CO VO x(0)

Corr: VC CC

C

V

ONSETx0 segment: | S S VO| N S x0

VO segment: N&S S VO

Encoding connection strength

For each constraint i , need to ‘embody’

• Constraint strength si

• Connection coefficients (Φ Ψ cell types)

Product of these is contribution of i to the Φ Ψ connection weight

φψ ΦΨ

1

WconN

ii

i

sc

ic

Network-level specification

—

Φ

Ψ

Processing

11 0R c

[P1] ∝ s1

1 1 11 1w [ ]P R s c

W = wii

22 0R c

Φ

Ψ

Development1 1

1R G c

1 1 0G c 1 1

1L G c

2 22R G c

2 2 0G c

2 22L G c

Φ

Ψ

Learning

2 22 2 2[ ]P K L G c

1 11 1 1

When and are simultaneously active,

[ ] is P K L G c

1 11L G c

11 1K L c

1 1[ ]P K

(during phase P+; reverse during P )

Learning Behavior

Simplified system can be solved analytically

Learning algorithm turns out to ≈ si

() = [# violations of constrainti

P ]

C-C:

CORRESPOND:

Abstract Gene Map

General Developmental Machinery Connectivity Constraint Coefficients

S L CC S L VC F S VC N/E L CC&VC S/W L CC&VC

direction extent target

C-I: V-I:

G

CO&V&x B 1 CC&VC B 2 CC CI&CO 1 VC VI&VO 1

RESPOND:

G

Summary

Described an attempt to integrate• Connectionist theory of mental processes

(computational neuroscience, cognitive psychology)

• Symbolic theory of Mental functions (philosophy, linguistics) Representations

– General structure (philosophy, AI)– Specific structure (linguistics)

Informs theory of UG• Form, content• Genetic encoding

Mystery Quote #4

“Smolensky, it would appear, would like a special dispensation for connectionist cognitive science to get the goodness out of Classical constituents without actually admitting that there are any.”

Mystery Quote #5

“ The view that the goal of connectionist research should be to replace other methodologies may represent a naive form of eliminative reductionism. … The goal … should not be to replace symbolic cognitive science, but rather …to explain the strengths and weaknesses of existing symbolic theory; to explain how symbolic computation can emerge out of non‑symbolic computation ...” conceptual‑level research with new computational concepts and techniques that reflect an understanding of how conceptual‑level theoretical constructs emerge from subconceptual computation …

Mystery Quote #5

“ The view that the goal of connectionist research should be to replace other methodologies may represent a naive form of eliminative reductionism. … The goal … should not be to replace symbolic cognitive science, but rather to explain the strengths and weaknesses of existing symbolic theory; to explain how symbolic computation can emerge out of non‑symbolic computation; to enrich conceptual‑level research with new computational concepts and techniques that reflect an understanding of how conceptual‑level theoretical constructs emerge from subconceptual computation…”

Thanks for your attention

Documents

The Harmonic Mind Paul Smolensky Cognitive Science Department Johns Hopkins University A Mystery ‘Co’-laborator Géraldine Legendre Alan Prince Peter Jusczyk