Modeling Language Acquisition with Neural Networks A preliminary research plan Steve R. Howell

Modeling Language Acquisition Modeling Language Acquisition with Neural Networkswith Neural Networks

A preliminary research plan

Steve R. Howell

Presentation OverviewPresentation Overview

Goals & challenges

Previous & related research

Model Overview

Implementation details

Project GoalsProject Goals

Model two aspects of human language (grammar and semantics)

Use single neural network performing word prediction

Use orthographic representation

Use small but functional word corpus

– e.g. child’s basic functional vocabulary?

ChallengesChallenges

Need a network architecture capable of modeling both grammar and semantics

What if phonology is required?

Computational limitations

Previous ResearchPrevious Research

Previous ResearchPrevious Research

Ellman (1990)

Mozer (1987)

Seidelberg & McLelland (1989)

Landauer et al. (LSA)

Rao & Ballard (1997)

Ellman (1990)Ellman (1990)

Simple recurrent network (context units)

No built-in representational constraints

Predicts next input from current plus context

Discovers word boundaries in continuous stream of phonemes (slide)

Ellman J. L. (1990). Finding structure in time. Cognitive Science, 14, p. 179-211

FOR MORE INFO...

Mozer (1987) - BLIRNETMozer (1987) - BLIRNET

Interesting model of spatially parallel processing of independent words

Possible input representation -letter triples– e.g. money = **M, **_0, *MO,…E_**, Y**

Encodes beginnings and ends of words well, as well as relative letter position, important to fit human relative-position priming data

Mozer M.C. (1987) Early parallel processing in reading: A connectionist approach. In M. Coltheart (Ed.) Attention and Performance, 12: The psychology of reading.

FOR MORE INFO...

Siedelberg & McLelland (1989)Siedelberg & McLelland (1989)

Model of word pronunciation

Again, relevant for the input representations used in its orthographic input - triples:

– MAKE = **MA, MAK, AKE, KE** (**= WS)

Distributed coding scheme for triples = distributed internal lexicon

Seidelberg, M.S. & McClelland J. L. (1989). A distributed, developmental model of word recognition and naming. Psychological Review, 96, 523-568

FOR MORE INFO...

Landauer et al.Landauer et al.

“LSA” - a statistical model of semantic learning

Very large word corpus Significant computation required Good performance Data set apparently proprietary

FOR MORE INFO...

Don’t call them, they’ll call you.

Rao & Ballard (1997)Rao & Ballard (1997)

Basis for present network Algorithms based on extended Kalman Filter Internal state variable is output of

input*weights (Internal state)*(transpose of feedforward

weights) feeds back to predict next input

Rao, R.P.N. & Ballard, D.H. (1997). Dynamic model of visual recognition predicts neural response properties in the visual cortex. Neural Computation, 9, 721-763.

FOR MORE INFO...

Model OverviewModel Overview

Model OverviewModel Overview

Architecture as in Rao & Ballard

Recurrent structure excellent for temporal variability

Starting with single layer network

Moving to multi-layer Rao & Ballard net

Model Overview(cont’d)Model Overview(cont’d)

High-level input representations

First layer of net performs word prediction from letters

Second layer adds word prediction from previous words

– Words predict next words - Simple grammar?

Model Overview(cont’d) Model Overview(cont’d)

Additional higher levels should add larger temporal range of context

Words in a large temporal range of current word help to predict it

– Implies semantic linkage?

– Analogous to LSA “Bag of words” approach at these levels

Possible AdvantagesPossible Advantages

Lower level units learn grammar

Higher level units learn semantics

Combines grammar-learning methods with “bag of words” approach

Possible modification to language generation

DisadvantagesDisadvantages

Complex mathematical implementation

Unclear how well higher levels will actually perform

As yet unclear how to modify the net for language generation

Implementation DetailsImplementation Details

Implementation ChallengesImplementation Challenges

Locating basic functional vocabulary of English language

– 600-800 words?

Compare to children’s language learning/usage, not adults

– Locating child data?

Model Evaluation (basic)Model Evaluation (basic)

Test grammar learning as per Ellman

Test semantic regularities as for LSA

Model Evaluation (optimistic)Model Evaluation (optimistic)

If generative modifications possible:

Ability to output words/phrases semantically linked to input?

– ElizaNet?

Child Turing Test?

– Human judges compare model output to real children's output for same input?

Current StatusCurrent Status

Continue reviewing previous research

Working through implementation details of Rao & Ballard algorithm

Considering different types of high-level input representations

Need to develop/acquire basic English vocabulary/grammar data set

Thank you.Thank you.

Questions and comments are sincerely welcomed. Thoughts on any of the questions raised herein

will be extremely valuable.FOR MORE INFO...

Please see my web page at: http://www.the-wire.com/~showell/

Documents

Modeling Language Acquisition with Neural Networks A preliminary research plan Steve R. Howell