Bayesian Connections: An Approach to Modeling Aspects of the Reading Process

Bayesian Connections:An Approach to Modeling Aspects of the Reading ProcessDavid A. MedlerCenter for the Neural Basis of CognitionCarnegie Mellon University

Bayesian ConnectionsThe Bayesian Approach to PsychologyHow do we represent the world?Bayesian Connectionist Framework.Bayesian Generative NetworksLearning letters.How does context affect learning?Empirical and Simulation Results.Symmetric Diffusion NetworksThe Ambiguity Advantage/Disadvantage.Closing Remarks

Representing the WorldProblem: How do we form meaningful internal representations, P(H), given our observations of the external world, P(D)?DP( )HP( )

Bayesian TheoryFor a given hypothesis, H, and observed data, D, the posterior probability of H given D is computed as:

whereP(H) = prior probability of the hypothesis, HP(D) = probability of the data, DP(D |H) = probability of D given H

Bayesian ConnectionismMediating Layer

It was 20 years ago today...An Interactive Activation Model of Context Effects in Letter PerceptionJames L. McClelland & David E. Rumelhart (1981; 1982)

Word superiority effectwords > pseudowords > nonwords

The model accounted for the time course of perceptual identification.

Interactive Activation Model

20 Years Later...Interactive Activation (IA) Model has been influential.Many positives, but 20 years of negatives.

Internal representations are hard-coded:The Interactive Activation Model does not learn!

Bayesian Generative NetworksInitial work is an expansion of the Bayesian Generative Network framework of Lewicki & Sejnowski, 1997.It is an unsupervised learning paradigm for multilayered architectures.Simplified network equations, added sparse coding constraints, & included a supervised component.

Bayesian Generative NetworksMediating Layer

Sparse Coding ConstraintsModified the basic framework to include sparse coding constraints.These are a Bayesian prior that constrain the types of representations learned.Sparse coding encourages the network to represent any given input pattern with relatively few units.

Step 1: Learning the AlphabetFirst stage of the IA model is the mapping between features and letters. We use the Rumelhart & Siple (1974) character features.

Network Learning16 surface units (corresponding to 16 line segments)30 representation units

Trained for 50 epochs (evaluated at 1, 10, 25 & 50)

Evaluated:Generative capability of the networkInternal representations formed

Generating the Alphabet

Interpreting Weight Structure

Network WeightsNo SparseCodingSparseCodingEpoch: 1 10 25 50

What We Have LearnedIn the unsupervised framework, the Bayesian Generative Network is able to learn the alphabet.Representations are not necessarily the same as the IA model.distributed (not localist)redundant (features are coded several times)Having learned the letters, can we now learn words?

Step 2: Learning WordsThe second stage of the IA model is the mapping from letters to words.The IA model is able to account for the word superiority effect using orthographic information only.Interested in how the Bayesian framework accounts for development of the word superiority effect.Look at participants learning of context.

Experimental MotivationOur motivation for the current experiments is the word-superiority effect.Specifically, we draw inspiration from the Reicher-Wheeler paradigm.

KQZW+GLUR+READ+

The TaskThe current set of studies was designed to simulate how the word superiority may develop. Specifically we were interested in:the learning of novel, letter-like stimuliwhether stimuli were learned in parts or wholesthe effects of context on learning.Consequently, we created an artificial environment in which we tightly controlled context.

Experimental Design: TrainingReicher-Wheeler task is based the discrimination between two characters.Wanted a similar task in which context would interact with a character pair.

Experimental Design: TestingTotal of 16 stimuliDetect changeTesting: 288 Stimuli

StimuliCharacters were constructed from the RS features.Each character had six line segments with the following constraints:characters were continuousno two segments formed a straight lineno character was a mirror image nor rotation of another.

Initial Simulations

Initial Simulation Results

Simulation ConclusionsRegardless of the network architecture, all simulations showed a (slight) difference between the familiar and crossed stimuli.No simulation performed well on the novel stimuli in comparison to the other stimuli.These results are somewhat counter to what we expected.Is the model broken?How do participants perform on this task?

Stimulus Presentation

Data AnalysisEach participants reaction time and proportion of hits and correct rejections were recorded.To correct for potential responder biases, the scores were converted to d scores using:d = ni(Hit) + ni(CR)

Experiment 1: One Novel4 Participants, 10 days each1440 trials per day:288 test trials intermixed with 1152 training trials.Three conditions:Familiar (AAA or BBB)Crossed (BAA or ABB)Novel (CAA or CBB)

d Scores

Do They Report a Change?

Reaction Times

Experiment ConclusionsAlthough there is a context effect, it is not as large as we expected, nor as stable.There are no significant differences in reaction times for any of the conditions.Participants do not perform well in the Novel conditionthis is due to a tendency to respond Change to all novel stimuli

Re-Simulation of TaskThe network was trained on the same data set that the participants were trained on.Network learned on all training/testing trialsWanted a similar measure for network performance.Used a variant of the Kullback-Leibler divergence measure:

Simulation: Difference Measure

Simulation: Report Change?

Internal RepresentationsIf we look at the internal representations formed by the network, we get an idea of why it behaves as it does...Training Day:1610

Simulation ConclusionsThe Bayesian Generative Network qualitatively matched the performance of the participants.Furthermore, analysis of the internal structure of the network offers an explanation for the participants behaviour.The network failed to learn to represent novel items.Thus, if the first generated representation is garbage, and the second generated representation is garbage, then the comparison will be garbage change

Assessing RepresentationsThe models predicted that participants in the one novel condition would fail to learn to represent the novel items.Unfortunately, we cant open up a person to see what their internal representation is.We can, however, ask them.Specifically, we can test their recognition of novel items following training and compare these to truly new items.

Experiment 210 ParticipantsTrained on the same data as Experiment 1 but were only run for 2 days.At the conclusion of the training, participants were given a new/old task in which they saw the 12 old training items, the 6 old novel items, and 12 new items.Participants saw a single character, and made the judgement old or new.

Experiment 2: ResultsParticipants were about 70% correct at detecting Old items.

Participants were no better at recognizing old Novel items than truly New items.

Learning ContextThe Bayesian Generative Network is able to learn higher order information such as which characters appear in which positions.It is able to both simulate and explain the performance of participants trained on a contextual learning task.It is able to predict new findings!Can we expand the model?

Symmetric Diffusion NetworkSymmetric Diffusion Networks (SDN) are a class of networks that explicitly embody many of the implicit assumptions made be the Bayesian Generative Network.

SDNs can be viewed as a more general form of the Bayesian Generative Network.

Symmetric Diffusion Network

Symmetric Diffusion NetworkSupervised Learning

Symmetric Diffusion NetworkUnsupervised Learning

SDN RepresentationOne advantage of the SDN is that it is able to learn continuous probability distributions.That is, it can learn multiple representations for the same input data.

For example, the SDN is able to learn multiple meanings for the word charge.

The Ambiguity ParadoxSymmetric Diffusion Networks allow us to address the Ambiguity Paradox.Ambiguous words are responded to faster than unambiguous words in a lexical decision task.Ambiguous words are responded to slower than unambiguous words in a semantic relatedness task.

The Ambiguity AdvantagechargechancechatheUnambiguous:Ambiguous:Non-Word:Is it a word?

The Ambiguity DisadvantagechargechancechatheUnambiguous:Ambiguous:Non-Word:Is it a word?feeluckthakeIs it related?

One Possible ExplanationEfficient then Inefficient HypothesisEfficient: Previous models have suggested that the ambiguity advantage results from a blend state (e.g., Joordens & Besner, 1994).Inefficient: The ambiguity disadvantage occurs in relatedness judgements because it takes longer to settle into a correct meaning

An Alternative ExplanationThe Symmetric Diffusion Network offers an alternative explanation.

Ambiguity AdvantagechancechancechancecomplaintcomplaintcomplainttaxtaxtaxchargechargechargeSemantic Space

Ambiguity DisadvantagechargechargechargeSemantic Spacecomplaint

Preliminary ConclusionsSymmetric Diffusion Networks are able to learn ambiguous meanings (in contrast to other models).It has provided a plausible theory for the ambiguity paradox.It suggests new empirical studies.

Larger network simulations are underway.

What have we learned?Introduced a class of connectionist networks that embody Bayesian principles.Using the IA model as inspiration, we:Compared the letter representations learned versus the hard-coded representations.Simulated, explained, & predicted empirical data on context learning.Addressed the ambiguity paradox.

The Next 20+ YearsContinue research on learning and how it interacts with the IA model and aspects of the reading process.Explore the Bayesian framework and how it relates to connectionism to a fuller extent.Make links to neurophysiologycan we find evidence of this type of learning and representation at the neural systems level.

The Take Home MessageWe are able to effectively model aspects of the reading process with connectionist networks embodying Bayesian Principles! These networks are able to qualitatively simulate observed data.These networks are able to predict new findings.Using very simple principles, these networks offer plausible explanations for a range of behaviours.

AcknowledgementsJay McClelland

Michael Lewicki Tai Sing Lee Michael Harm David Noelle Chris KelloDarren Piercey

Documents

Bayesian Connections: An Approach to Modeling Aspects of the Reading Process