Download ppt - Jakobson's Grand Unified Theory of Linguistic Cognition Paul Smolensky Cognitive Science Department Johns Hopkins University Elliott Moreton Karen Arnold

Jakobson's Grand Unified Theory of Linguistic Cognition

Paul SmolenskyCognitive Science Department

Johns Hopkins University

Elliott MoretonKaren Arnold Donald Mathis

Melanie Soderstrom

Géraldine LegendreAlan Prince

Peter Jusczyk Suzanne Stevenson

with:

Grammar and Cognition

1. What is the system of knowledge? 2. How does this system of

knowledge arise in the mind/brain? 3. How is this knowledge put to use? 4. What are the physical mechanisms

that serve as the material basis for this system of knowledge and for the use of this knowledge?

(Chomsky ‘88; p. 3)

Advertisement

The complete story, forthcoming (2003) Blackwell:

The harmonic mind: From neural computation to optimality-theoretic

grammarSmolensky & Legendre

A Grand Unified Theory for the cognitive science of language is enabled by Markedness:

Avoid α①Structure

• Alternations eliminate α• Typology: Inventories lack α

②Acquisition• α is acquired late

③Processing• α is processed poorly

④Neural• Brain damage most easily disrupts α

Jakobson’s Program

Formalize through OT?

OT

①

③

④

②

StructureAcquisition UseNeural

Realization

Theoretical. OT (Prince & Smolensky ’91,

’93): – Construct formal grammars directly from

markedness principles– General formalism/ framework for

grammars: phonology, syntax, semantics; GB/LFG/…

– Strongly universalist: inherent typology Empirical. OT:– Allows completely formal markedness-

based explanation of highly complex data

/

• Theoretical Formal structure enables OT-general:– Learning algorithms

•Constraint Demotion: Provably correct and efficient (when part of a general decomposition of the grammar learning problem)

– Tesar 1995 et seq. – Tesar & Smolensky 1993, …, 2000

•Gradual Learning Algorithm – Boersma 1998 et seq.

Structure Acquisition UseNeural Realization

Initial state

Empirical – Initial state predictions explored

through behavioral experiments with infants

Structure Acquisition UseNeural

Realization

• Theoretical– Theorems regarding the computational

complexity of algorithms for processing with OT grammars • Tesar ’94 et seq.• Ellison ’94• Eisner ’97 et seq.• Frank & Satta ’98• Karttunen ’98

• Empirical (with Suzanne Stevenson)– Typical sentence processing theory:

heuristic constraints– OT: output for every input; enables

incremental (word-by-word) processing– Empirical results concerning human

sentence processing difficulties can be explained with OT grammars employing independently motivated syntactic constraints

– The competence theory [OT grammar] is the performance theory [human parsing heuristics]

• Empirical

Structure Acquisition UseNeural

Realization

• Theoretical OT derives from the theory of abstract neural (connectionist) networks – via Harmonic Grammar (Legendre, Miyata,

Smolensky ’90)

For moderate complexity, now have general formalisms for realizing– complex symbol structures as distributed

patterns of activity over abstract neurons– structure-sensitive constraints/rules as

distributed patterns of strengths of abstract synaptic connections

– optimization of Harmony

Construction of a miniature, concrete LAD

Program

Structure OT

•Constructs formal grammars directly from markedness principles

•Strongly universalist: inherent typology

OT allows completely formal markedness-based explanation of highly complex data

AcquisitionInitial state predictions explored through

behavioral experiments with infants

Neural Realization Construction of a miniature, concrete LAD

The Great Dialectic

Phonological representations serve two masters

Phonological Representation Lexico

nPhoneti

cs

Phonetic interface

[surface form]

Often: ‘minimize effort (motoric & cognitive)’;

‘maximize discriminability’

Locked in conflict

Lexical interface

/underlying form/

Recoverability: ‘match this invariant

form’

FAITHFULNESSMARKEDNESS

OT from Markedness Theory

• MARKEDNESS constraints: *α: No α• FAITHFULNESS constraints

– Fα demands that /input/ [output] leave α unchanged (McCarthy & Prince ’95)

– Fα controls when α is avoided (and how)

• Interaction of violable constraints: Ranking – α is avoided when *α ≫ Fα

– α is tolerated when Fα ≫ *α

– M1 ≫ M2: combines multiple markedness dimensions


• MARKEDNESS constraints: *α• FAITHFULNESS constraints: Fα

• Interaction of violable constraints: Ranking – α is avoided when *α ≫ Fα – α is tolerated when Fα ≫ *α – M1 ≫ M2: combines multiple markedness dimensions

• Typology: All cross-linguistic variation results from differences in ranking – in how the dialectic is resolved (and in how multiple markedness dimensions are combined)


• MARKEDNESS constraints• FAITHFULNESS constraints• Interaction of violable constraints: Ranking • Typology: All cross-linguistic variation

results from differences in ranking – in resolution of the dialectic

• Harmony = MARKEDNESS + FAITHFULNESS

– A formally viable successor to Minimize Markedness is OT’s Maximize Harmony (among competitors)

Structure

Explanatory goals achieved by OT• Individual grammars are literally

and formally constructed directly from universal markedness principles

• Inherent Typology : Within the analysis of phenomenon Φ in language L is inherent a typology of Φ across all languages

Program

Structure OT

• Constructs formal grammars directly from markedness principles

• Strongly universalist: inherent typology OT allows completely formal

markedness-based explanation of highly complex data --- Friday




Structure: Summary

• OT builds formal grammars directly from markedness: MARK, with FAITH

Friday:• Inventories consistent with markedness

relations are formally the result of OT with local conjunction

• Even highly complex patterns can be explained purely with simple markedness constraints: all complexity is in constraints’ interaction through ranking and conjunction: Lango ATR vowel harmony

Program

Structure OT


• Strongly universalist: inherent typology OT allows completely formal markedness-


AcquisitionInitial state predictions explored

through behavioral experiments with infants


Nativism I: Learnability

• Learning algorithm – Provably correct and efficient (under strong

assumptions)

– Sources:• Tesar 1995 et seq. • Tesar & Smolensky 1993, …, 2000

– If you hear A when you expected to hear E, increase the Harmony of A above that of E by minimally demoting each constraint violated by A below a constraint violated by E

in +possible

Candidates

FaithMark (NPA)

☹ ☞ Einpossibl

e *

A impossibl

e *

Faith

*☺ ☞

If you hear A when you expected to hear E, increase the Harmony of A above that of E by minimally demoting each constraint violated by A below a constraint violated by E

Constraint Demotion Learning

Correctly handles difficult case: multiple violations in E

Nativism I: Learnability

• M ≫ F is learnable with /in+possible/→impossible– ‘not’ = in- except when followed by …– “exception that proves the rule, M = NPA”

• M ≫ F is not learnable from data if there are no ‘exceptions’ (alternations) of this sort, e.g., if lexicon produces only inputs with mp, never np: then M and F, no M vs. F conflict, no evidence for their ranking

• Thus must have M ≫ F in the initial state, ℌ0

The Initial State

OT-general: MARKEDNESS ≫ FAITHFULNESS

Learnability demands (Richness of the Base)

(Alan Prince, p.c., ’93; Smolensky ’96a)

Child production: restricted to the unmarked

Child comprehension: not so restricted (Smolensky ’96b)

Nativism II: Experimental Test

Collaborators Peter Jusczyk Theresa Allocco Language Acquisition (2002)

Nativism II: Experimental Test

• Linking hypothesis: More harmonic phonological stimuli

⇒ Longer listening time • More harmonic:

M ≻ *M, when equal on F F ≻ *F, when equal on M– When must chose one or the other,

more harmonic to satisfy M: M ≫ F

• M = Nasal Place Assimilation (NPA)

• X/Y/XY paradigm (P. Jusczyk)

un...b...umb

un...b...umb

Experimental Paradigm

p = .006um...b...umb um...b...iŋgu

iŋ…..gu...iŋgu vs. iŋ…..gu…umb

… … ∃FAITH

• Headturn Preference Procedure (Kemler Nelson et al. ‘95; Jusczyk ‘97)

•Highly general paradigm: Main result

ℜ *FNP

15.36

12.31

0

2

4

6

8

10

12

14

16

18

20

Faithfulness Markedness M ≫ F

Tim

e (s

ec)

Higher HLower H

4.5 Months (NPA)Higher

HarmonyLower Harmony

um…ber…umber

um…ber… iŋgu

p = .006 (11/16)

15.2315.36

12.7312.31

0

2

4

6

8

10

12

14

16

18

20


Tim

e (s

ec)

Higher HLower H

Higher Harmony

Lower Harmony

um…ber…umber

un…ber…unber

p = .044 (11/16)

4.5 Months (NPA)

15.2315.36

12.7312.31

0

2

4

6

8

10

12

14

16

18

20


Tim

e (s

ec)

Higher HLower H

4.5 Months (NPA) Markedness * Faithfulness

* Markedness Faithfulness

un…ber…umber

un…ber…unber

???

16.75

15.2315.3614.01

12.7312.31

0

2

4

6

8

10

12

14

16

18

20


Tim

e (s

ec)

Higher HLower H

4.5 Months (NPA)Higher

HarmonyLower Harmony

un…ber…umber

un…ber…unber

p = .001 (12/16)

Program

Structure OT


• Strongly universalist: inherent typology OT allows completely formal markedness-




Neural Realization Construction of a miniature, concrete

LAD

The question

• The nativist hypothesis, central to generative linguistic theory:

Grammatical principles respected by all human languages are encoded in the genome.

• Questions:– Evolutionary theory: How could this

happen?– Empirical question: Did this happen?– Today: What — concretely — could it

mean for a genome to encode innate knowledge of universal grammar?

UGenomics

• The game: Take a first shot at a concrete example of a genetic encoding of UG in a Language Acquisition Device¿ Proteins ⇝ Universal grammatical principles ?

Time to willingly suspend disbelief …

UGenomics

• The game: Take a first shot at a concrete example of a genetic encoding of UG in a Language Acquisition Device¿ Proteins ⇝ Universal grammatical principles ?

• Case study: Basic CV Syllable Theory (Prince & Smolensky ’93)

• Innovation: Introduce a new level, an ‘abstract genome’ notion parallel to [and encoding] ‘abstract neural network’

Grammar Innate Constraints

Abstract Neural Network Abstract Genome

Biological Neural Network Biological Genome

= A instantiates B

= A encodes B

Approach: Multiple Levels of Encoding

UGenome for CV Theory

• Three levels– Abstract symbolic: Basic CV Theory– Abstract neural: CVNet– Abstract genomic: CVGenome

UGenomics: Symbolic Level

• Three levels– Abstract symbolic: Basic CV

Theory– Abstract neural: CVNet– Abstract genomic: CVGenome




= A instantiates B

= A encodes B


Basic syllabification: Function

• Basic CV Syllable Structure Theory– ‘Basic’ — No more than one segment

per syllable position: .(C)V(C).

• ƒ: /underlying form/ [surface form]• /CVCC/ [.CV.C V C.] /pæd+d/[pædd]

• Correspondence Theory– McCarthy & Prince 1995 (‘M&P’)

• /C1V2C3C4/ [.C1V2.C3 V C4]

Why basic CV syllabification?

• ƒ: underlying surface linguistic forms• Forms simple but combinatorially

productive • Well-known universals; typical typology• Mini-component of real natural

language grammars• A (perhaps the) canonical model of

universal grammar in OT

• PARSE: Every element in the input corresponds to an element in the output

• ONSET: No V without a preceding C

• etc.

Syllabification: Constraints (Con)

UGenomics: Neural Level





= A instantiates B

= A encodes B


CVNet Architecture

/C1 C2/ [C1 V C2]

CV

/ C1 C2 /

[

C1

V

C2

]

‘1’

‘2’

Connection substructure

Local: fixed, gene-tically determinedContent of constraint 1

Global: variable during learningStrength of constraint 1

1

s1

1c

2

is2

2c

Network weight:

Network input: ι = WΨ a

φψ ΦΨ

1

WconN

ii

i

sc

PARSE

C

V

3 3

3

3

33

1

11

1

1

1

3 3

3

3

33

3 3

3

3

33

• All connection coefficients are +2

ONSET• All connection coefficients are 1

C

V

Crucial Open Question(Truth in Advertising)

• Relation between strict domination and neural networks?

CVNet Dynamics

• Boltzmann machine/Harmony network– Hinton & Sejnowski ’83 et seq. ; Smolensky ‘83 et

seq.

– stochastic activation-spreading algorithm: higher Harmony more probable

– CVNet innovation: connections realize fixed symbol-level constraints with variable strengths

– learning: modification of Boltzmann machine algorithm to new architecture

Learning Behavior

• A simplified system can be solved analytically

• Learning algorithm turns out to ≈ si

() = [# violations of constrainti

P ]

UGenomics: Genome Level





= A instantiates B

= A encodes B


Connectivity geometry• Assume 3-d grid geometry

V

C

‘E’

‘N’

‘back’

C

V

ONSETx0 segment: | S S VO| N S x0

• VO segment: N&S S VO

• Correspondence units grow north & west and connect with input & output units.

• Output units grow east and connect

Connectivity: PARSE• Input units grow south and connect

C

V

3 3

3

3

3 3

1

1 1

1

1

1

3 3

3

3

3 3

3 3

3

3

3 3

C

V

3 3

3

3

3 3

1

1 1

1

1

1

3 3

3

3

3 3

3 3

3

3

3 3

C

V

3 3

3

3

3 3

3 3

3

3

3 3

1

1 1

1

1

1

3 3

3

3

3 3

3 3

3

3

3 3

3 3

3

3

3 3

3 3

3

3

3 3

To be encoded• How many different kinds of units are

there? • What information is necessary (from

the source unit’s point of view) to identify the location of a target unit, and the strength of the connection with it?

• How are constraints initially specified? • How are they maintained through the

learning process?

Unit types

• Input units C V• Output units C V x• Correspondence units C V• 7 distinct unit types• Each represented in a distinct sub-

region of the abstract genome• ‘Help ourselves’ to implicit

machinery to spell out these sub-regions as distinct cell types, located in grid as illustrated

Direction of projection growth

• Topographic organizations widely attested throughout neural structures– Activity-dependent growth a possible

alternative

• Orientation information (axes)– Chemical gradients during development– Cell age a possible alternative

Projection parameters

• Direction• Extent

– Local– Non-local

• Target unit type• Strength of connections encoded

separately

Connectivity Genome

• Contributions from ONSET and PARSE:

Source:

CI VI CO VO CC VC xo

Projec-tions:

S LCC S L VC E L CC E L VC

N&S S VO

N S x0

N L CI

W L CO

N L VI

W L VO

S S VO

Key: Direction Extent Target

N(orth) S(outh)E(ast) W(est)F(ront) B(ack)

L(ong) S(hort)

Input: CI VI

Output: CO VO x(0)

Corr: VC CC

CVGenome: Connectivity C-I V-I C-C V-C C-O V-O x

D E T D E T D E T D E T D E T D E T D E T

IDENTITY F Sh V-C B Sh C-C LINEARITY N/E L C-C&V-C N/E L C-C&V-C

S/W L C-C&V-C S/W L C-C&V-C INTEGRITY S L C-C S L V-C

N L C-C N L V-C UNIFORMITY E L C-C E L V-C

W L C-C W L V-C OUTPUTID F Sh V-O B Sh C-O F Sh C-O

B Sh x B Sh x F Sh V-O NOOUTGAPS N Sh x* N Sh x* S Sh C-O&V-O

RESPOND CORRESPOND S L C-C S L V-C N L C-I N L V-I E L C-C E L V-C

W L C-O W L V-O PARSE S L C-C S L V-C N L C-I N L V-I E L C-C E L V-C

W L C-O W L V-O FILL-V S L V-C N L V-I

W L V-O E L V-C FILL-C S L C-C N L C-I E L C-C

W L C-O ONSET N Sh V-O S Sh 1rst V-O

S Sh V-O N Sh 1rst x

NOCODA N Sh C-O N Sh C-O S Sh C-O S Sh x

Encoding connection strength

• For each constraint i , need to ‘embody’

– Constraint strength si

– Connection coefficients (Φ Ψ cell types)

• Product of these is contribution of i to the Φ Ψ connection weight

φψ ΦΨ

1

WconN

ii

i

sc

ic

Network-level specification

—

Φ

Ψ

Processing

11 0R c

[P1] ∝ s1

1 1 11 1w [ ]P R s c

W = wii

22 0R c

Φ

Ψ

Development1 1

1R G c

1 1 0G c 1 1

1L G c

2 22R G c

2 2 0G c

2 22L G c

Φ

Ψ

Learning

2 22 2 2[ ]P K L G c

1 11 1 1

When and are simultaneously active,

[ ] is P K L G c

1 11L G c

11 1K L c

1 1[ ]P K

(during phase P+; reverse during P )

CVGenome: Connection Coefficients

Constraint From To Strength Constraint From To Strength IDENTITY C-C V-C 1 PARSE C-C&V-C bias 3

LINEARITY C-C&V-C C-C&V-C 1 C-I&V-I bias 1 INTEGRITY C-C&V-C C-C&V-C 1 C-I&C-O C-C 2

UNIFORMITY C-C C-C 1 V-I&V-O V-C 2 OUTPUTID C-O&V-O&x C-O&V-O&x 2 FILL-V V-C bias 3

NOOUTGAPS x C-O&V-O 1 V-O bias 1 RESPOND C-O&V-O&x bias 1 V-I&V-O V-C 2

CORRESPOND C-C&V-C bias 2 FILL-C C-C bias 3 C-C C-I&C-O 1 C-O bias 1 V-C V-I&V-O 1 C-I&C-O C-C 2

NOCODA C-O C-O&x 1 ONSET V-O V-O&x 1

C-C:

CORRESPOND:

Abstract Gene Map

General Developmental Machinery Connectivity Constraint Coefficients

S L CC S L VC F S VC N/E L CC&VC S/W L CC&VC

direction extent target

C-I: V-I:

G

CO&V&x B 1 CC&VC B 2 CC CI&CO 1 VC VI&VO 1

RESPOND:

G

UGenomics

• Realization of processing and learning algorithms in ‘abstract molecular biology’, using the types of interactions known to be biologically possible and genetically encodable

UGenomics

• Host of questions to address– Will this really work?– Can it be generalized to distributed nets?– Is the number of genes [77=0.26%]

plausible?– Are the mechanisms truly biologically

plausible?– Is it evolvable?

How is strict domination to be handled?

Hopeful Conclusion

• Progress is possible toward a Grand Unified Theory of the cognitive science of language– addressing the structure, acquisition, use, and

neural realization of knowledge of language– strongly governed by universal grammar– with markedness as the unifying principle– as formalized in Optimality Theory at the

symbolic level– and realized via Harmony Theory in abstract

neural nets which are potentially encodable genetically

€Thank you for your attention

(and indulgence)

Hopeful Conclusion

• Progress is possible toward a Grand Unified Theory of the cognitive science of language

Still lots of promissory notes, butall in a common currency — Harmony ≈ unmarkedness; hopefullythis will promote further progress by facilitating integration of the sub-disciplines of cognitive science