Click here to load reader
Upload
colleen-farmer
View
249
Download
0
Tags:
Embed Size (px)
Citation preview
The Harmonic Mind
Paul SmolenskyCognitive Science Department
Johns Hopkins University
A Mystery ‘Co’-laborator
Géraldine LegendreAlan Prince
Peter Jusczyk Donald Mathis
Melanie Soderstrom
with:
Personal Firsts thanks to SPP
First invited talk! (& first visit to JHU, 1986) First public confessional: midnight thoughts
of a worried connectionist (UNC, 1988) First generative syntax talk (Memphis, 1994) First attempt at stand-up comedy (Columbia,
2000) First rendition of a 900-page book as a
graphical synopsis in Powerpoint (1 minute from now)
Advertisement
Blackwell 2002 (??) Develop the Integrated Connectionist/Symbolic
(ICS) Cognitive Architecture Case study in formalist multidisciplinary
cognitive science
The Harmonic Mind:
From neural computation to optimality-theoretic grammar
Paul Smolensky & Géraldine Legendre
Talk Plan
‘Sketch’ the ICS cognitive architecture, pointing to contributions from/to traditional disciplines
Topics of direct philosophical relevance• Explanation of the productivity of cognition• Nativism
Theoretical work– Symbolic– Connectionist
Experimental work
Mystery Quote #1
“Smolensky has recently been spending a lot of his time trying to show that, vivid first impressions to the contrary notwithstanding, some sort of connectionist cognitive architecture can indeed account for compositionality, productivity, systematicity, and the like. It turns out to be rather a long story … 185 pages … are devoted to Smolensky’s telling of it, and there appears to be no end in sight. It seems it takes a lot of squeezing to get this stone to bleed.”
Computational neuroscience ICS Key sources
• Hopfield 1982, 1984• Cohen and Grossberg 1983• Hinton and Sejnowski 1983, 1986• Smolensky 1983, 1986• Geman and Geman 1984 • Golden 1986, 1988
11 1 2
daa i a
dt a1
i1(0.6
)
a2
i2(0.5
)
–λ(–0.9)
1 1 2 2 1 22 2
1 2
( )
)
H a i a i a a
a a
a
½(
Competitive Net
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0 0.2 0.4 0.6 0.8 1
a1 (i 1 = 0.6)
a2
(i2
= 0
.5)
Competitive Net
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1
Time
a1
a2
Processing I: Activation
Competitive Net
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1
Time
a1
a2
H
-0.5
-0.2
0.1
0.4
0.7 1
-0.500.51
-1.2
-1-0.8
-0.6
-0.4
-0.2
0
0.2
Harmony
a2
a1
Competitive Net
Processing — spreading activation — is
optimization: Harmony maximization
–λ(–0.9)
a1
i1(0.6
)
a2
i2(0.5
)
Processing II: Optimization
-0.5
-0.2
0.1
0.4
0.7 1
-0.500.51
-1.2
-1-0.8
-0.6
-0.4
-0.2
0
0.2
Harmony
a2
a1
Competitive Net
a1 must be active (strength: 0.6)
0.79
–0.21Optimal
compromise:
Key sources:• Hinton & Anderson
1981• Rumelhart,
McClelland, & the PDP Group 1986
Cognitive psychology ICS
a2 must be active (strength: 0.5)
Harmony maximization is satisfaction of parallel,
violable constraints
a1 and a2 must not be
simultaneously active (strength:
λ)
Representation
Symbolic theory ICS• Complex symbol structures• Generative linguistics ICS
Particular linguistic representations
PDP connectionism ICS• Distributed activation patterns
ICS: • realization of (higher-level) complex
symbolic structures in distributed patterns of activation over (lower-level) units (‘tensor product representations’ etc.)
Representation
Activation patterns: cat and its constituents
-1 4 9 14
Unit (Area = activation level)
k/r0
æ/r01
t/r11
σ/rε
[σ k [æ t]]
σ
ktæ
Linguistics (markedness theory) ICS ICS Generative linguistics:
Optimality Theory Key sources:
• Prince & Smolensky 1993 [ms.; Rutgers report]
• McCarthy & Prince 1993 [ms.]• Texts: Archangeli & Langendoen 1997,
Kager 1999, McCarthy 2001• Electronic archive: rutgers/ruccs/roa.html
Constraints
Met in SPP Debate, 1988!
Constraints
NOCODA: A syllable has no codaσ
ktæ
* violation
W
* H(a[σ k [æ t]) =
–sNOCODA < 0
a[σ k [æ t ]] *
* violation
Constraint Interaction I
ICS Grammatical theory• Harmonic Grammar
Legendre, Miyata, Smolensky 1990 et seq.
= H=
a aW
Constraint Interaction I
σ
ktæ
H
=( , )i ij jc cH
H(k/, σ) H(σ,\
t)ONSET
Onset/k
The grammar generates the representation that maximizes H: this best-satisfies the constraints, given their differential strengths
NOCODA
Coda/t
Any formal language can be so generated.
Harmonic Grammar Parser
Simple, comprehensible network Simple grammar G
• X → A B Y → B A Language
•
Parsing
A B B A
X Y
A B B A
X Y
Top-down
A B B A Bottom-up
X Y
Harmonic Grammar Parser
Representations:
⊗
Filler vectors: A, B, X,
Y
Role vectors: rε = 1 r0 = (1 1) r1
= (1 –1)
①②
③④
⑤⑥
⑦⑧
⑨⑩
⑪⑫
i, j, k ∊ {A, B, X, Y}
j k
i
Depth 0 Depth 1
Harmonic Grammar Parser
Representations:
0 1 2 3 4 5 6 7 8 9 10 11 12 13
Units
[Y B A]
[X A B]
[B A]
[A B]
B —
— A
B —
A —
B
A
Y
X
W (Y — A)
0
1
2
3
4
5
0 1 2 3 4 5 6 7 8 9
Depth-1 Units
Dep
th-0
Un
its
W (Y B —)
0
1
2
3
4
5
0 1 2 3 4 5 6 7 8 9
Depth-1 Units
Dep
th-0
Un
its
W (Y B A)
0
1
2
3
4
5
0 1 2 3 4 5 6 7 8 9
Depth-1 Units
Dep
th-0
Un
its
Harmonic Grammar Parser Weight matrix for Y → B A
H(Y, B—) > 0H(Y, —A) > 0
Harmonic Grammar Parser Weight matrix for X → A B
W (X A —)
0
1
2
3
4
5
0 1 2 3 4 5 6 7 8 9
Depth-1 Units
De
pth
-0 U
nit
s
W (X — B)
0
1
2
3
4
5
0 1 2 3 4 5 6 7 8 9
Depth-1 Units
Dep
th-0
Uni
ts
W (X A B)
0
1
2
3
4
5
0 1 2 3 4 5 6 7 8 9
Depth-1 Units
Dep
th-0
Un
its
W (X A B)
0
1
2
3
4
5
0 1 2 3 4 5 6 7 8 9
Depth-1 Units
Dep
th-0
Un
its
W (Y B A)
0
1
2
3
4
5
0 1 2 3 4 5 6 7 8 9
Depth-1 Units
De
pth
-0 U
nit
s
Harmonic Grammar Parser Weight matrix for entire grammar
G W (X A B, Y B A)
0
1
2
3
4
5
0 1 2 3 4 5 6 7 8 9
Depth-1 Units
Dep
th-0
Un
its
Bottom-up Parsing
A B → X = (1 1 1 1)/2
-1
0
1
2
3
4
5
6
0 1 2 3 4 5
Depth-0 Units
Tim
e
A B → X
0
1
2
Time
a H
Top-down ParsingX → A B = (1 0 -1 0 0 1 0 -1)/4
-1
0
1
2
3
4
5
6
0 1 2 3 4 5 6 7 8 9
Depth-1 Units
Tim
e
X A B
0
0.25
0.5
Time
a H
Explaining Productivity
Full-scale parsing of formal languages by neural-network Harmony maximization: productive competence
How to explain?
1. Structured representations
Representations
0 1 2 3 4 5 6 7 8 9 10 11 12 13
Uni ts
YBA
XAB
BA
AB
B —
— A
B —
A —
B
A
Y
X
+ 2. Structured connectionsW (X A B, Y B A)
0
1
2
3
4
5
0 1 2 3 4 5 6 7 8 9
Depth-1 Uni ts
W (X — B)
0
1
2
3
4
5
0 1 2 3 4 5 6 7 8 9
Depth-1 Units
W (Y B —)
0
1
2
3
4
5
0 1 2 3 4 5 6 7 8 9
Depth-1 Units
W (X A —)
0
1
2
3
4
5
0 1 2 3 4 5 6 7 8 9
Depth-1 Units
W (Y — A)
0
1
2
3
4
5
0 1 2 3 4 5 6 7 8 9
Depth-1 Units
W (X A B)
0
1
2
3
4
5
0 1 2 3 4 5 6 7 8 9
Depth-1 Units
W (Y B A)
0
1
2
3
4
5
0 1 2 3 4 5 6 7 8 9
Depth-1 Uni ts
= Proof of Productivity
Productive behavior follows mathematically from combining • the combinatorial structure of the
vectorial representations encoding inputs & outputs
and • the combinatorial structure of the
weight matrices encoding knowledge
Mystery Quote #2
“Paul Smolensky has recently announced that the problem of explaining the compositionality of concepts within a connectionist framework is solved in principle. … This sounds suspiciously like the offer of a free lunch, and it turns out, upon examination, that there is nothing to it.”
Explaining Productivity I
Representations
0 1 2 3 4 5 6 7 8 9 10 11 12 13
Units
[X A B]
[A B]
— B
A —
X
+
+ Intra-level decompositio
n: [A B] {A, B}
Inter-level decomposition: [A B] {1,0,1,…
1}
Semantics
Processes
Processes
GOFAI
ICS
ICS & GOFAI
Explaining Productivity II
Intra-level decomposition: G {XAB, YBA}
Inter-level decomposition: [A B] {1,0,1,…
1}
Semantics
Processes
Processes
GOFAI
ICS
ICS & GOFAIW (X A B, Y B A)
0
1
2
3
4
5
0 1 2 3 4 5 6 7 8 9
Depth-1 Units
Dep
th-0
Un
its
+
W (Y B A)
0
1
2
3
4
5
0 1 2 3 4 5 6 7 8 9
Depth-1 Units
De
pth
-0 U
nit
s
W (Y B A)
0
1
2
3
4
5
0 1 2 3 4 5 6 7 8 9
Depth-1 Units
De
pth
-0 U
nit
s
Mystery Quote #3
“ … even after all those pages, Smolensky hasn’t so much as made a start on constructing an alternative to the Classical account of the compositionality phenomena.”
Constraint Interaction II: OT
ICS Grammatical theory• Optimality Theory
Prince & Smolensky 1993
Constraint Interaction II: OT
Differential strength encoded in strict domination hierarchies:• Every constraint has complete priority
over all lower-ranked constraints (combined)
• = ‘Take-the-best’ heuristic (Hertwig, today) constraint cue ranking cue validity
• Decision-theoretic justification for OT?• Approximate numerical encoding employs
special (exponentially growing) weights
Constraint Interaction II: OT
Candidates STRESSHEAVY MAINSTRESSRIGHT Harmony
a. σHσ…σσ
n
**…*
n
n(wMAINSTRESSRIGHT)
b. σHσ…σσ n
* wSTRESSHEAVY
“Grammars can’t count”
Stress is on the initial heavy syllable iff the number of light syllables n obeys TRESS EAVY
AIN TRESS IGHT
S H
M S R
wn
w No way, man
Constraint Interaction II: OT
Constraints are universal Human grammars differ only in
how these constraints are ranked• ‘factorial typology’
First true contender for a formal theory of cross-linguistic typology
The Faithfulness / Markedness Dialectic
‘cat’: /kat/ kæt *NOCODA — why?• FAITHFULNESS requires identity• MARKEDNESS often opposes it
Markedness-Faithfulness dialectic diversity• English: NOCODA ≫ FAITH • Polynesian: FAITH ≫ NOCODA (~French)
Another markedness constraint M: • Nasal Place Agreement [‘Assimilation’] (NPA):
mb ≻ nb, ŋb nd ≻ md, ŋd ŋg ≻ ŋb, ŋd labial coronal velar
Nativism I: Learnability
Learning algorithm • Provably correct and efficient (under
strong assumptions)
• Sources:Tesar 1995 et seq. Tesar & Smolensky 1993, …, 2000
• If you hear A when you expected to hear E, minimally demote each constraint violated by A below a constraint violated by E
in +possible
Candidates
FaithMark (NPA)
☹ ☞ Einpossibl
e *
A impossibl
e *
Faith
*☺ ☞
• If you hear A when you expected to hear E, minimally demote each constraint violated by A below a constraint violated by E
Constraint Demotion Learning
Correctly handles difficult case: multiple violations in E
Nativism I: Learnability
M ≫ F is learnable with /in+possible/→impossible• ‘not’ = in- except when followed by …• “exception that proves the rule”: M = NPA
M ≫ F is not learnable from data if there are no ‘exceptions’ (alternations) of this sort, e.g., if no affixes and all underlying morphemes have mp: √M and √F, no M vs. F conflict, no evidence for their ranking
Thus must have M ≫ F in the initial state, ℌ0
Nativism II: Experimental Test
Linking hypothesis: More harmonic phonological stimuli
⇒ Longer listening time More harmonic:
• √M ≻ *M, when equal on F• √F ≻ *F, when equal on M• When must chose one or the other,
more harmonic to satisfy M: M ≫ F M = Nasal Place Assimilation (NPA)
Collaborators Peter Jusczyk Theresa Allocco (Elliott Moreton, Karen Arnold)
15.36
12.31
0
2
4
6
8
10
12
14
16
18
20
Faithfulness Markedness M ≫ F
Tim
e (s
ec)
Higher HLower H
4.5 Months (NPA)
Higher Harmony
Lower Harmony
um…ber…umber
um…ber… iŋgu
p = .006 (11/16)
15.2315.36
12.7312.31
0
2
4
6
8
10
12
14
16
18
20
Faithfulness Markedness M ≫ F
Tim
e (s
ec)
Higher HLower H Higher
HarmonyLower Harmony
um…ber…umber
un…ber…unber
p = .044 (11/16)
4.5 Months (NPA)
15.2315.36
12.7312.31
0
2
4
6
8
10
12
14
16
18
20
Faithfulness Markedness M ≫ F
Tim
e (s
ec)
Higher HLower H
4.5 Months (NPA) Markedness Faithfulness
un…ber…umber
un…ber…unber
???
16.75
15.2315.3614.01
12.7312.31
0
2
4
6
8
10
12
14
16
18
20
Faithfulness Markedness M ≫ F
Tim
e (s
ec)
Higher HLower H
4.5 Months (NPA)Higher
HarmonyLower Harmony
un…ber…umber
un…ber…unber
p = .001 (12/16)
Nativism III: UGenome
Can we combine• Connectionist realization of harmonic
grammar• OT’s characterization of UG
to examine the biological plausibility of UG as innate knowledge?
Collaborators• Melanie Soderstrom• Donald Mathis
Nativism III: UGenome
The game: take a first shot at a concrete example of a genetic encoding of UG in a Language Acquisition Device
Introduce an ‘abstract genome’ notion parallel to (and encoding) ‘abstract neural network’
Is connectionist empiricism clearly more biologically plausible than symbolic nativism? No!
The Problem
No concrete examples of such a LAD exist
Even highly simplified cases pose a hard problem:
How can genes — which regulate production of proteins — encode symbolic principles of grammar?
Test preparation: Syllable Theory
Basic syllabification: Function
ƒ: /underlying form/ [surface form] Plural form of dish:
• /dš+s/ [.d.š z.] /CVCC/ [.CV.C V C.]
Basic syllabification: Function
ƒ: /underlying form/ [surface form] Plural form of dish:
• /dš+s/ [.d.š z.] /CVCC/ [.CV.C V C.] Basic CV Syllable Structure Theory
• Prince & Smolensky 1993: Chapter 6• ‘Basic’ — No more than one segment
per syllable position: .(C)V(C).
Basic syllabification: Function
ƒ: /underlying form/ [surface form] Plural form of dish:
• /dš+s/ [.d.š z.] /CVCC/ [.CV.C V C.] Basic CV Syllable Structure Theory Correspondence Theory
• McCarthy & Prince 1995 (‘M&P’) /C1V2C3C4/ [.C1V2.C3 V C4]
PARSE: Every element in the input corresponds to an element in the output — “no deletion” [M&P: ‘MAX’]
Syllabification: Constraints (Con)
PARSE: Every element in the input corresponds to an element in the output
FILLV/C: Every output V/C segment corresponds to an input V/C segment [every syllable position in the output is filled by an input segment] — “no insertion/epenthesis” [M&P: ‘DEP’]
Syllabification: Constraints (Con)
PARSE: Every element in the input corresponds to an element in the output
FILLV/C: Every output V/C segment corresponds to an input V/C segment
ONSET: No V without a preceding C
Syllabification: Constraints (Con)
PARSE: Every element in the input corresponds to an element in the output
FILLV/C: Every output V/C segment corresponds to an input V/C segment
ONSET: No V without a preceding C
NOCODA: No C without a following V
Syllabification: Constraints (Con)
SAnet architecture
/C1 C2/ [C1 V C2]
CV
/C1 C2 /
[
C1
V
C2
]
Connection substructure
Local: fixed, gene-tically determinedContent of constraint 1
Global: variable during learningStrength of constraint 1
1
s1
1c
2
is2
2c
W iic Network weight:
Network input: ι = WΨ a
PARSE
C
V
3 3
3
3
33
1
11
1
1
1
3 3
3
3
33
3 3
3
3
33
All connection coefficients are +2
ONSET All connection coefficients are 1
C
V
Activation dynamics
W a
φ
logφ φι /
1(a 1) ƒ (ι / )
1 Tpr Te
φψ ΦΨ
1
WconN
ii
i
sc
Boltzmann Machine/Harmony Theory dynamics (temperature T 0)
( )/( ) H Tp e aa
Boltzmann-type learning dynamics
Clamped: P += input & output; P = input
Δsi = ε[E{Hi |P +} E{Hi |P
}]
εE{Hi|P } =
During the processing of training data in phase P , whenever unit φ (of type Φ) and unit ψ (of type Ψ) are simultaneously active, modify si by ε . [ε = ε/Np ]
( ) ( )ΦΨ φ ψφψ
ε a api ppc
( | )( ) ( | )ln
( | )I Op O I
p I p O Ip O I
õ
Gradient descent in
ic
Crucial Open Question(Truth in Advertising)
Relation between strict domination and neural networks?
Apparently not a problem in the case of the CV Theory
To be encoded How many different kinds of units are
there? What information is necessary (from
the source unit’s point of view) to identify the location of a target unit, and the strength of the connection with it?
How are constraints initially specified? How are they maintained through the
learning process?
Unit types
Input units C V Output units C V x Correspondence units C V 7 distinct unit types Each represented in a distinct sub-
region of the abstract genome ‘Help ourselves’ to implicit
machinery to spell out these sub-regions as distinct cell types, located in grid as illustrated
Connectivity geometry Assume 3-d grid geometry
V
C
‘E’
‘N’
‘back’
Constraint: PARSE
CV
3 33
3
33
111
11
1
3 33
3
33
3 33
3
33
Input units grow south and connect Output units grow east and connect Correspondence units grow north & west
and connect with input & output units.
Constraint: ONSET Short connections grow north-south
between adjacent V output units, and between the first V node and the
first x node.
C
V
Direction of projection growth
Topographic organizations widely attested throughout neural structures• Activity-dependent growth a possible
alternative Orientation information (axes)
• Chemical gradients during development• Cell age a possible alternative
Projection parameters
Direction Extent
• Local• Non-local
Target unit type Strength of connections encoded
separately
Connectivity Genome
Contributions from ONSET and PARSE:
Source:
CI VI CO VO CC VC xo
Projec-tions:
S LCC S L VC E L CC E L VC
N&S S VO
N S x0
N L CI
W L CO
N L VI
W L VO
S S VO
Key: Direction Extent Target
N(orth) S(outh)E(ast) W(est)F(ront) B(ack)
L(ong) S(hort)
Input: CI VI
Output: CO VO x(0)
Corr: VC CC
C
V
ONSETx0 segment: | S S VO| N S x0
VO segment: N&S S VO
Encoding connection strength
For each constraint i , need to ‘embody’
• Constraint strength si
• Connection coefficients (Φ Ψ cell types)
Product of these is contribution of i to the Φ Ψ connection weight
φψ ΦΨ
1
WconN
ii
i
sc
ic
Network-level specification
—
Φ
Ψ
Processing
11 0R c
[P1] ∝ s1
1 1 11 1w [ ]P R s c
W = wii
22 0R c
Φ
Ψ
Development1 1
1R G c
1 1 0G c 1 1
1L G c
2 22R G c
2 2 0G c
2 22L G c
Φ
Ψ
Learning
2 22 2 2[ ]P K L G c
1 11 1 1
When and are simultaneously active,
[ ] is P K L G c
1 11L G c
11 1K L c
1 1[ ]P K
(during phase P+; reverse during P )
Learning Behavior
Simplified system can be solved analytically
Learning algorithm turns out to ≈ si
() = [# violations of constrainti
P ]
C-C:
CORRESPOND:
Abstract Gene Map
General Developmental Machinery Connectivity Constraint Coefficients
S L CC S L VC F S VC N/E L CC&VC S/W L CC&VC
direction extent target
C-I: V-I:
G
CO&V&x B 1 CC&VC B 2 CC CI&CO 1 VC VI&VO 1
RESPOND:
G
Summary
Described an attempt to integrate• Connectionist theory of mental processes
(computational neuroscience, cognitive psychology)
• Symbolic theory of Mental functions (philosophy, linguistics) Representations
– General structure (philosophy, AI)– Specific structure (linguistics)
Informs theory of UG• Form, content• Genetic encoding
Mystery Quote #4
“Smolensky, it would appear, would like a special dispensation for connectionist cognitive science to get the goodness out of Classical constituents without actually admitting that there are any.”
Mystery Quote #5
“ The view that the goal of connectionist research should be to replace other methodologies may represent a naive form of eliminative reductionism. … The goal … should not be to replace symbolic cognitive science, but rather …to explain the strengths and weaknesses of existing symbolic theory; to explain how symbolic computation can emerge out of non‑symbolic computation ...” conceptual‑level research with new computational concepts and techniques that reflect an understanding of how conceptual‑level theoretical constructs emerge from subconceptual computation …
Mystery Quote #5
“ The view that the goal of connectionist research should be to replace other methodologies may represent a naive form of eliminative reductionism. … The goal … should not be to replace symbolic cognitive science, but rather to explain the strengths and weaknesses of existing symbolic theory; to explain how symbolic computation can emerge out of non‑symbolic computation; to enrich conceptual‑level research with new computational concepts and techniques that reflect an understanding of how conceptual‑level theoretical constructs emerge from subconceptual computation…”
Thanks for your attention