1
Acknowledgement References Future directions [1] Piaget, J. (1965). The stages of the intellectual development of the child [2] Arbib, M.A., Conklin, E.J., and Hill, J.A.C. (1987). From schema theory to language [3] Lee, J. (2012). Linking eyes to mouth: a schema-based computational model for describing visual scenes (the- sis) [4] Barrès, V., Lee, J. (2013). Template Construction Grammar: from visual scene description to language compre- hension and agrammatism. [5] Knoeferle, P., and Crocker, M.W. (2006). The Coordinated Interplay of Scene, Utterance, and World Knowledge: Evidence From Eye Tracking [6] Navalpakkam,V., Itti, L. (2005). Modeling the influence of task on attention [7] Gleitman, L., January, D., Nappa, R., Trueswell, J. (2007). On the give and take between event apprehension and utterance formulation. [8] Itti, L., Arbib, M.A. (2006). Attention and the Minimal Subscene can be matched against those generated by the visual system can be matched against those generated by the visual system - From production to comprehension: We have already proposed a comprehension model which conceptually extends the production model by making contact with neural architecture data, identiying how a multi-route architecture in which grammatical and world knowedge can enter in cooperative-computation can generate semantic representations that can be matched against or modified by those generated by the visual system. In addition this model addressed how lesions could explain various aspects of agrammatism comprehen- sion preformances in sentence-picture matching tasks [4] . The implementation of the comprehension model is underway, and one of the main challenges will lie in integrating production and comprehension, in particular in explanation of language impairments. Thanks to Matt Crocker and Pia Knoeferle for sharing some of their data. This material is based in part on work supported by the National Science Foundation under Grant No. BCS-1343544 (Michael A. Arbib, Principal Investigator) - Better address cognitive structuring of visual scene in a perceptual hierarchy through attentinal shifts [8] . - Integration of bottom-up and top-down attention using saliency mod- els [6] . - Modeling production impairments: Anomia vs. Broca’s aphasia vs. Wer- nicke’s aphasia Semantic WM (t:21.0) Grammatical WM (t:21.0) Visual WM (t:21.0) O15_CELLIST_245 (1.0) area_1: (x:462.0,y:216.0,w:171.0,h:530.0) CELLIST_249 (1.0) EXIST_S_250 (0.7) A_DET_NOUN_251 (0.7) CELLIST_252 (0.7) Semantic WM (t:31.0) Grammatical WM (t:31.0) Visual WM (t:31.0) O15_CELLIST_245 (1.0) area_1: (x:462.0,y:216.0,w:171.0,h:530.0) O14_BALLERINA_246 (0.9) area_2: (x:389.0,y:626.0,w:192.0,h:451.0) BALLERINA_255 (0.9) CELLIST_249 (0.9) EXIST_S_250 (0.7) A_DET_NOUN_251 (0.7) CELLIST_252 (0.7) EXIST_S_256 (0.6) A_DET_NOUN_257 (0.7) BALLERINA_258 (0.7) Visual WM (t:41.0) Semantic WM (t:41.0) Grammatical WM (t:41.0) O15_CELLIST_245 (0.9) area_1: (x:462.0,y:216.0,w:171.0,h:530.0) O14_BALLERINA_246 (0.9) area_2: (x:389.0,y:626.0,w:192.0,h:451.0) BALLERINA_255 (0.9) CELLIST_249 (0.9) EXIST_S_250 (0.7) A_DET_NOUN_251 (0.7) CELLIST_252 (0.8) EXIST_S_256 (0.6) A_DET_NOUN_257 (0.7) BALLERINA_258 (0.7) Visual WM (t:91.0) Semantic WM (t:91.0) Grammatical WM (t:91.0) A8_SPLASH_244 (0.8) area_0: (x:367.0,y:434.0,w:255.0,h:102.0) O15_CELLIST_245 (0.9) area_1: (x:462.0,y:216.0,w:171.0,h:530.0) R15_SPLASHEE_247 (0.8) O14_BALLERINA_246 (0.8) area_2: (x:389.0,y:626.0,w:192.0,h:451.0) R14_SPLASHER_248 (0.8) BALLERINA_255 (0.7) CELLIST_249 (0.8) SPLASH_261 (0.8) AGENT_262 (0.8) PATIENT_263 (0.8) EXIST_S_250 (0.7) A_DET_NOUN_251 (0.8) SVO_264 (0.5) PAS_SVO_265 (0.5) REL_SVO_WHO_266 (0.7) CELLIST_252 (0.8) REL_PAS_SVO_WHO_267 (0.7) EXIST_S_256 (0.7) A_DET_NOUN_257 (0.8) BALLERINA_258 (0.8) SPLASH_268 (0.7) Semantic WM (t:291.0) Grammatical WM (t:291.0) Visual WM (t:291.0) A8_SPLASH_244 (0.6) area_0: (x:367.0,y:434.0,w:255.0,h:102.0) O15_CELLIST_245 (0.6) area_1: (x:462.0,y:216.0,w:171.0,h:530.0) R15_SPLASHEE_247 (0.6) O14_BALLERINA_246 (0.6) area_2: (x:389.0,y:626.0,w:192.0,h:451.0) R14_SPLASHER_248 (0.6) BALLERINA_255 (0.4) CELLIST_249 (0.4) SPLASH_261 (0.4) AGENT_262 (0.4) PATIENT_263 (0.4) EXIST_S_250 (0.9) A_DET_NOUN_251 (1.0) SVO_264 (0.7) PAS_SVO_265 (0.8) REL_SVO_WHO_266 (0.4) CELLIST_252 (0.5) REL_PAS_SVO_WHO_267 (0.5) EXIST_S_256 (0.9) A_DET_NOUN_257 (0.9) BALLERINA_258 (0.5) SPLASH_268 (0.6) t:21 t:31 t:41 t:91 t:293 t:12.0 t:23.0 t:1.0 OUTPUT_2 (t:291.0) Saccades Samples of WM states 'a cellist is splash -ed by a ballerina' Utterance at t:300 PATIENT-FIRST EXAMPLE From eye movements to information structure and construction selection The passive construction receives a boost in activation due to its relative preference (compared to the active construction) for mapping onto a meaning representation in which the pa- tient concept has a higher activation than the agent. This is the case here since the input assumes a more salient patient, with the high saliency value percolating down to the semantic level. Note however, that due to exponential decay of instances in WMs, the initial difference in activation values between patient and agent will tend to attenuate with time, which can down the line impact the construction dynamics in grammatical WM. The model here produces saccades only based on bottom-up saliency values PATIENT > AGENT > ACTION. At each time step, the state of all each work- ing memory is updated. The time pressure on production is kept low and the model simply has to produce an ut- terance before t=300. Note that at the time of production, not all competitions have been resolved and the best constructions assemblage is selected to gen- erate the meaning-form mapping. Grammatical WM Grammatical WM (t:21.0) name: EXIST_S_161 activity: 0.3 name: EXIST_S class: S SemFrame SynForm name: A_DET_NOUN_162 activity: 0.4 name: A_DET_NOUN class: NP SemFrame SynForm name: WOMAN_163 activity: 0.2 name: WOMAN class: N SynForm SemFrame name: WOMAN2_164 activity: 0.2 name: WOMAN2 class: N SemFrame SynForm Semantic WM (t:21.0) WOMAN_160 (1.0) OBJECT there is next [NP] next ENTITY a [N] next WOMAN woman WOMAN lady Grammatical WM (t:411.0) name: EXIST_S_161 activity: 0.9 name: EXIST_S class: S SemFrame SynForm name: A_DET_NOUN_162 activity: 0.8 name: A_DET_NOUN class: NP SemFrame SynForm name: WOMAN2_164 activity: 0.7 name: WOMAN2 class: N SynForm SemFrame Semantic WM (t:411.0) WOMAN_160 (0.7) OBJECT there is next [NP] next ENTITY a [N] next WOMAN lady Simple example of dynamic competition and cooperation in grammatical WM. Cooperating constructions generate > “There is a womanSchema theory and Template Construction Grammar name: BLUE_169 activity: 0.6 name: BLUE class: A SynForm SemFrame BLUE blue p_out:1 name: SPA_171 activity: 0.5 name: SPA class: S SemFrame SynForm ENTITY PROPERTY MODIFY [NP] is next [A] next p_in:1 p_in:2 p_out:1 Schema network (LTM) Schemas instances Features Lower-level features activate schemas Schemas cover lower-level features instantiate schema BILL EARS DUCK RABBIT INST1 (act1) INST4 (act4) INST3 (act3) INST2 (act2) INST5 (act5) INST6 (act6) Working Memory (t) Schema theory Expanding on the notion of schema put forward by Piag- et 1 , schema theory 2 offers a computational framework to ex- plicitly simulate how schemas can cooperate to organize the behavior of an organism. LTM, WM, and instantiation Learned perceptual, motor, and semantic knowledge is represented as a schema network. When activated, sche- mas are instantiated in working memory where they remain active as long as they are relevant to the ongoing behavior. Cooperative computation (C2) Schemas compete and cooperate to form schema as- semblages. Cooperating schemas reinforce each-other’s activation levels, while competing schemas inhibit each-other. These assemblages form flexible and distribut- ed control structures that adaptively organize how informa- tion is processed by the organism. High level view of a WM schema module Template Construction Grammar Language schemas are defined as constructions, mapping meaning onto form. Constructions span a large range of scope from lexical to argument structure constructions. The linguistic knowledge is de- fined by a set of constructions and is assumed to stored in grammatical LTM. Construction instances, through cooperative computation in grammatical WM, incrementally generate flexible meaning-form mappings. (For a simple example see Grammatical WM panel) (Note: In the current implementation, lexical and non-lexical constructions share a similar in representation, but are processed slightly differently by the grammatical WM: only lexical constructions receive external activation from the semantic WM) Introduction Question: How does our neural system orchestrate the interac- tions between visuo-attentional and language process- ing? The current line of work expands the schema level Template Construction Grammar (TCG) model of language production developed by Arbib and Lee [3] to offer: (1) a novel implementation of the production model that is fully dynamic and distributed in its operations and architecture (focus of this poster) (2) a unified model of the processes supporting the in- teraction of vision and language in both the produc- tion and comprehension of visual scene descriptions. (see future directions) Here we give a first overview of the novel production model and we present an example of how it can smoothly integrate visual-attentional saccadic dy- namics with online grammatical processing during a visual scene description task, simulating the impact of eye movements on grammatical structure [7] . Adapted from Gleitman et al. 07 500 ms 60-75 ms Describe the scene Produce: “The man on the right is kicked by the man on the rightBiasing a subject towards observing first the patient of an action favors the choice of a passive construction during scene description. Scene description model SCENE DESCRIPTION SCHEMA SYSTEM INPUT Subscene recognition OUTPUT_1 Control GrammaticalWM(P) SemanticWM PhonologicalWM(P) Utter ConceptLTM Conceptualizer CxnRetrieval(P) PerceptLTM GrammaticalLTM VisualWM OUTPUT_2 PRODUCTION SCHEMA SYSTEM Visual WM (t:291.0) A8_SPLASH_244 (0.6) area_0: (x:367.0,y:434.0,w:255.0,h:102.0) O14_BALLERINA_246 (0.6) area_2: (x:389.0,y:626.0,w:192.0,h:451.0) R14_SPLASHER_248 (0.6) O15_CELLIST_245 (0.6) area_1: (x:462.0,y:216.0,w:171.0,h:530.0) R15_SPLASHEE_247 (0.6) Semantic WM (t:291.0) SPLASH_261 (0.4) CELLIST_255 (0.4) PATIENT_263 (0.4) BALLERINA_249 (0.4) AGENT_262 (0.4) Grammatical WM (t:291.0) EXIST_S_250 (0.9) A_DET_NOUN_251 (1.0) SVO_264 (0.8) PAS_SVO_265 (0.7) REL_PAS_SVO_WHO_267 (0.4) BALLERINA_252 (0.5) REL_SVO_WHO_266 (0.4) EXIST_S_256 (0.9) A_DET_NOUN_257 (1.0) CELLIST_258 (0.5) SPLASH_268 (0.6) O15_CELLIST_245 (0.8) O14_BALLERINA_246 (0.4) A8_SPLASH_244 (0.8) O16_FENCER_249 (0.4) O19_BAG_251 (0.8) Q1_RED_252(0.8) O18_TREE_250 (0.8) R15_SPLASHEE_247(0.3) R16_SPLASHER_248(0.5) R9_HAS_PARAM_253(0.3) SS_BALLERINA (1.0, 1) SS_CELLIST (0.9, 1) SS_FENCER (0.7,1) SS_SPLASH_EVT (0.8,5) SS_BAG (0.6,3) SS_TREE (0.5,1) INPUT t:1.0 t:23.0 t:12.0 OUTPUT_2 (t:291.0) Phonological WM (t:291.0) a (1.0) ballerina (1.0) splash (1.0) a (1.0) cellist (1.0) All the modules operate in a distributed way and their operations take place asynchronously. The long term memories (LTM) hold fixed states reprsenting various forms of knowledge. The working memories (WM) hold the dynamic states of the system . The model has one input (scene) and two outputs (eye fixations and utterance). The figure shows a snapshot of the system’s WMs states (functional links across WMs not shown). Neuroscience Graduate Program, University of Southern California, Los Angeles, CA, USA V. Barrès, M. A. Arbib VISUAL ATTENTION, MEANING, AND GRAMMAR : NEURO-COMPUTATIONAL MODELING OF SITUATED LANGUAGE USE

V. Barrès, M. A. Arbib Neuroscience Graduate Program ......[8] Itti, L., Arbib, M.A. (2006). Attention and the Minimal Subscene can be matched against those generated by the visual

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: V. Barrès, M. A. Arbib Neuroscience Graduate Program ......[8] Itti, L., Arbib, M.A. (2006). Attention and the Minimal Subscene can be matched against those generated by the visual

Acknowledgement

References

Future directions

[1] Piaget, J. (1965). The stages of the intellectual development of the child[2] Arbib, M.A., Conklin, E.J., and Hill, J.A.C. (1987). From schema theory to language[3] Lee, J. (2012). Linking eyes to mouth: a schema-based computational model for describing visual scenes (the-sis)[4] Barrès, V., Lee, J. (2013). Template Construction Grammar: from visual scene description to language compre-hension and agrammatism.[5] Knoeferle, P., and Crocker, M.W. (2006). The Coordinated Interplay of Scene, Utterance, and World Knowledge: Evidence From Eye Tracking[6] Navalpakkam,V., Itti, L. (2005). Modeling the in�uence of task on attention[7] Gleitman, L., January, D., Nappa, R., Trueswell, J. (2007). On the give and take between event apprehension and utterance formulation.[8] Itti, L., Arbib, M.A. (2006). Attention and the Minimal Subscene

can be matched against those generated by the visual systemcan be matched against those generated by the visual system

- From production to comprehension: We have already proposed a comprehension model which conceptually extends the production model by making contact with neural architecture data, identiying how a multi-route architecture in which grammatical and world knowedge can enter in cooperative-computation can generate semantic representations that can be matched against or modi�ed by those generated by the visual system. In addition this model addressed how lesions could explain various aspects of agrammatism comprehen-sion preformances in sentence-picture matching tasks[4]. The implementation of the comprehension model is underway, and one of the main challenges will lie in integrating production and comprehension, in particular in explanation of language impairments.

Thanks to Matt Crocker and Pia Knoeferle for sharing some of their data.This material is based in part on work supported by the National Science Foundation under Grant No. BCS-1343544 (Michael A. Arbib, Principal Investigator)

- Better address cognitive structuring of visual scene in a perceptual hierarchy through attentinal shifts[8].

- Integration of bottom-up and top-down attention using saliency mod-els[6].

- Modeling production impairments: Anomia vs. Broca’s aphasia vs. Wer-nicke’s aphasia

Semantic WM (t:21.0)Grammatical WM (t:21.0)Visual WM (t:21.0)

O15_CELLIST_245 (1.0)area_1: (x:462.0,y:216.0,w:171.0,h:530.0)

CELLIST_249 (1.0)EXIST_S_250 (0.7)A_DET_NOUN_251 (0.7)CELLIST_252 (0.7)

Semantic WM (t:31.0)

Grammatical WM (t:31.0)

Visual WM (t:31.0)

O15_CELLIST_245 (1.0)area_1: (x:462.0,y:216.0,w:171.0,h:530.0)

O14_BALLERINA_246 (0.9)area_2: (x:389.0,y:626.0,w:192.0,h:451.0)

BALLERINA_255 (0.9)

CELLIST_249 (0.9)

EXIST_S_250 (0.7)A_DET_NOUN_251 (0.7)CELLIST_252 (0.7)

EXIST_S_256 (0.6)A_DET_NOUN_257 (0.7)BALLERINA_258 (0.7)

Visual WM (t:41.0)Semantic WM (t:41.0)

Grammatical WM (t:41.0)

O15_CELLIST_245 (0.9)area_1: (x:462.0,y:216.0,w:171.0,h:530.0)

O14_BALLERINA_246 (0.9)area_2: (x:389.0,y:626.0,w:192.0,h:451.0)

BALLERINA_255 (0.9)

CELLIST_249 (0.9)

EXIST_S_250 (0.7)A_DET_NOUN_251 (0.7)CELLIST_252 (0.8)

EXIST_S_256 (0.6)A_DET_NOUN_257 (0.7)BALLERINA_258 (0.7)

Visual WM (t:91.0)

Semantic WM (t:91.0)

Grammatical WM (t:91.0)

A8_SPLASH_244 (0.8)area_0: (x:367.0,y:434.0,w:255.0,h:102.0)

O15_CELLIST_245 (0.9)area_1: (x:462.0,y:216.0,w:171.0,h:530.0)

R15_SPLASHEE_247 (0.8)

O14_BALLERINA_246 (0.8)area_2: (x:389.0,y:626.0,w:192.0,h:451.0)

R14_SPLASHER_248 (0.8)

BALLERINA_255 (0.7)

CELLIST_249 (0.8)

SPLASH_261 (0.8) AGENT_262 (0.8)

PATIENT_263 (0.8)

EXIST_S_250 (0.7)

A_DET_NOUN_251 (0.8)

SVO_264 (0.5)

PAS_SVO_265 (0.5)REL_SVO_WHO_266 (0.7)

CELLIST_252 (0.8)

REL_PAS_SVO_WHO_267 (0.7)

EXIST_S_256 (0.7)

A_DET_NOUN_257 (0.8)BALLERINA_258 (0.8)

SPLASH_268 (0.7)

Semantic WM (t:291.0)

Grammatical WM (t:291.0)

Visual WM (t:291.0)

A8_SPLASH_244 (0.6)area_0: (x:367.0,y:434.0,w:255.0,h:102.0)

O15_CELLIST_245 (0.6)area_1: (x:462.0,y:216.0,w:171.0,h:530.0)

R15_SPLASHEE_247 (0.6)

O14_BALLERINA_246 (0.6)area_2: (x:389.0,y:626.0,w:192.0,h:451.0)

R14_SPLASHER_248 (0.6)

BALLERINA_255 (0.4)

CELLIST_249 (0.4)

SPLASH_261 (0.4) AGENT_262 (0.4)

PATIENT_263 (0.4)

EXIST_S_250 (0.9)

A_DET_NOUN_251 (1.0)

SVO_264 (0.7)

PAS_SVO_265 (0.8)REL_SVO_WHO_266 (0.4)

CELLIST_252 (0.5)

REL_PAS_SVO_WHO_267 (0.5)

EXIST_S_256 (0.9)

A_DET_NOUN_257 (0.9)BALLERINA_258 (0.5)

SPLASH_268 (0.6)

t:21

t:31

t:41

t:91

t:293

t:12.0

t:23.0

t:1.0

OUTPUT_2 (t:291.0)

Saccades

Samples of WM states

'a cellist is splash -ed by a ballerina'Utterance at t:300

PATIENT-FIRST EXAMPLE

From eye movements to information structure and construction selectionThe passive construction receives a boost in activation due to its relative preference (compared to the active construction) for mapping onto a meaning representation in which the pa-tient concept has a higher activation than the agent. This is the case here since the input assumes a more salient patient, with the high saliency value percolating down to the semantic level.Note however, that due to exponential decay of instances in WMs, the initial di�erence in activation values between patient and agent will tend to attenuate with time, which can down the line impact the construction dynamics in grammatical WM.

The model here produces saccades only based on bottom-up saliency values PATIENT > AGENT > ACTION.

At each time step, the state of all each work-ing memory is updated.

The time pressure on production is kept low and the model simply has to produce an ut-terance before t=300.

Note that at the time of production, not all competitions have been resolved and the best constructions assemblage is selected to gen-erate the meaning-form mapping.

Grammatical WM

Grammatical WM (t:21.0)

name: EXIST_S_161activity: 0.3

name: EXIST_Sclass: S

SemFrame

SynForm

name: A_DET_NOUN_162activity: 0.4

name: A_DET_NOUNclass: NP

SemFrame

SynForm

name: WOMAN_163activity: 0.2

name: WOMANclass: N

SynForm

SemFrame

name: WOMAN2_164activity: 0.2

name: WOMAN2class: N

SemFrame

SynForm

Semantic WM (t:21.0)

WOMAN_160 (1.0)

OBJECT

there isnext [NP]next

ENTITYa [N]next

WOMANwoman

WOMANlady

Grammatical WM (t:411.0)

name: EXIST_S_161activity: 0.9

name: EXIST_Sclass: S

SemFrame

SynForm

name: A_DET_NOUN_162activity: 0.8

name: A_DET_NOUNclass: NP

SemFrameSynForm

name: WOMAN2_164activity: 0.7

name: WOMAN2class: N

SynForm

SemFrame

Semantic WM (t:411.0)

WOMAN_160 (0.7)

OBJECT

there isnext [NP]next

ENTITY

a [N]next

WOMANlady

Simple example of dynamic competition and cooperation in grammatical WM.

Cooperating constructions generate > “There is a woman”

Schema theory and Template Construction Grammar

name: BLUE_169activity: 0.6

name: BLUEclass: A

SynForm

SemFrame

BLUEblue

p_out:1name: SPA_171activity: 0.5

name: SPAclass: S

SemFrameSynForm

ENTITYPROPERTYMODIFY

[NP] isnext [A]next

p_in:1 p_in:2

p_out:1

Schema network(LTM)

Schemas instances

FeaturesLower-level featuresactivate schemas

Schemas coverlower-level features

instantiateschema

BILLEARS

DUCKRABBIT

INST1 (act1) INST4 (act4)

INST3 (act3)

INST2 (act2) INST5 (act5)

INST6 (act6)

Working Memory (t)

Schema theory Expanding on the notion of schema put forward by Piag-et1, schema theory2 offers a computational framework to ex-plicitly simulate how schemas can cooperate to organize the behavior of an organism.

LTM, WM, and instantiation Learned perceptual, motor, and semantic knowledge is represented as a schema network. When activated, sche-mas are instantiated in working memory where they remain active as long as they are relevant to the ongoing behavior.

Cooperative computation (C2) Schemas compete and cooperate to form schema as-semblages. Cooperating schemas reinforce each-other’s activation levels, while competing schemas inhibit each-other. These assemblages form flexible and distribut-ed control structures that adaptively organize how informa-tion is processed by the organism.

High level view of a WM schema module

Template Construction GrammarLanguage schemas are defined as constructions, mapping meaning onto form. Constructions span a large range of scope from lexical to argument structure constructions. The linguistic knowledge is de-fined by a set of constructions and is assumed to stored in grammatical LTM.Construction instances, through cooperative computation in grammatical WM, incrementally generate flexible meaning-form mappings. (For a simple example see Grammatical WM panel)

(Note: In the current implementation, lexical and non-lexical constructions share a similar in representation, but are processed slightly differently by the grammatical WM: only lexical constructions receive external activation from the semantic WM)

IntroductionQuestion: How does our neural system orchestrate the interac-tions between visuo-attentional and language process-ing?

The current line of work expands the schema level Template Construction Grammar (TCG) model of language production developed by Arbib and Lee[3] to o�er:(1) a novel implementation of the production model that is fully dynamic and distributed in its operations and architecture (focus of this poster)(2) a uni�ed model of the processes supporting the in-teraction of vision and language in both the produc-tion and comprehension of visual scene descriptions. (see future directions)

Here we give a �rst overview of the novel production model and we present an example of how it can smoothly integrate visual-attentional saccadic dy-namics with online grammatical processing during a visual scene description task, simulating the impact of eye movements on grammatical structure[7].

Adapted from Gleitman et al. 07

500 ms 60-75 ms Describe the scene

Produce: “The man on the right is kicked by the man on the right”

Biasing a subject towards observing �rst the patient of an action favors the choice of a passive construction during

scene description.

Scene description model

SCENE DESCRIPTION SCHEMA SYSTEM

INPUTSubscene recognition

OUTPUT_1

Control

GrammaticalWM(P)SemanticWM PhonologicalWM(P) Utter

ConceptLTM

Conceptualizer

CxnRetrieval(P)

PerceptLTM

GrammaticalLTM

VisualWM

OUTPUT_2

PRODUCTION SCHEMA SYSTEM

Visual WM (t:291.0)

A8_SPLASH_244 (0.6)area_0: (x:367.0,y:434.0,w:255.0,h:102.0)

O14_BALLERINA_246 (0.6)area_2: (x:389.0,y:626.0,w:192.0,h:451.0)

R14_SPLASHER_248 (0.6)

O15_CELLIST_245 (0.6)area_1: (x:462.0,y:216.0,w:171.0,h:530.0)

R15_SPLASHEE_247 (0.6)

Semantic WM (t:291.0)

SPLASH_261 (0.4) CELLIST_255 (0.4)PATIENT_263 (0.4)

BALLERINA_249 (0.4)AGENT_262 (0.4)

Grammatical WM (t:291.0)

EXIST_S_250 (0.9)

A_DET_NOUN_251 (1.0)SVO_264 (0.8)PAS_SVO_265 (0.7)

REL_PAS_SVO_WHO_267 (0.4)BALLERINA_252 (0.5)

REL_SVO_WHO_266 (0.4)

EXIST_S_256 (0.9)

A_DET_NOUN_257 (1.0)

CELLIST_258 (0.5)

SPLASH_268 (0.6)

O15_CELLIST_245 (0.8)

O14_BALLERINA_246 (0.4)

A8_SPLASH_244 (0.8)

O16_FENCER_249 (0.4)

O19_BAG_251 (0.8)

Q1_RED_252(0.8)

O18_TREE_250 (0.8)

R15_SPLASHEE_247(0.3)R16_SPLASHER_248(0.5)

R9_HAS_PARAM_253(0.3)

SS_BALLERINA (1.0, 1)

SS_CELLIST (0.9, 1)

SS_FENCER (0.7,1)SS_SPLASH_EVT (0.8,5)

SS_BAG (0.6,3)

SS_TREE (0.5,1)

INPUT

t:1.0

t:23.0

t:12.0

OUTPUT_2 (t:291.0)

Phonological WM (t:291.0)

a (1.0) ballerina (1.0) splash (1.0) a (1.0) cellist (1.0)

All the modules operate in a distributed way and their operations take place asynchronously.The long term memories (LTM) hold �xed states reprsenting various forms of knowledge.The working memories (WM) hold the dynamic states of the system .The model has one input (scene) and two outputs (eye �xations and utterance).The �gure shows a snapshot of the system’s WMs states (functional links across WMs not shown).

Neuroscience Graduate Program, University of Southern California, Los Angeles, CA, USA

V. Barrès, M. A. Arbib

VISUAL ATTENTION, MEANING, AND GRAMMAR : NEURO-COMPUTATIONAL MODELING OF SITUATED LANGUAGE USE