Handling Spatially Complex English-to-ASL MT with a Multi-Path Pyramidal Architecture Matt Huenerfauth CLUNCH Presentation November 3, 2003

Handling Spatially Complex English-to-ASL MT with a

Multi-Path Pyramidal Architecture

Matt Huenerfauth

CLUNCH PresentationNovember 3, 2003

ASL Machine Translationwith Pyramids andInvisible Worlds

Matt Huenerfauth

CLUNCH PresentationNovember 3, 2003

Today’s TalkThis is work in progress.

• ASL Linguistics and Machine Translation

• Initial Approaches to ASL MT

• Handling Spatially Complex ASL– A Multi-Path MT Architecture.– Adopting some HMS lab technology.– Interesting Linguistic Motivations.

• Current and Future Work

Motivations and Applications

• Only half of deaf high school graduates can read English at a fourth-grade level – despite sophisticated ASL fluency.

• Many efforts to help the deaf access the hearing world forget English is their 2nd language (& different than ASL).

• Applications for a Machine Translation System:– TV captioning, teletype telephones.– Human interpreters intrusive/expensive.– Educational tools, access to information.– Storage and transmission of ASL.

Output: Signing Virtual Humans

• Virtual reality models of the human form are now articulate & fast enough to produce ASL.

• ASL Generator produces instructions for the avatar, and the avatar performs the signs -- producing animated output for the user.

• Our problem is how to build these instructions.

Virtual Signing Humans

Photos: Seamless Solutions, Inc.Simon the Signer (Bangham et al. 2000.)Vcom3D Corporation

ASL Linguistics I• What is ASL?

– Real language? Who uses it? – Different than SEE or SSE.

• How is it different than English?– Grammar, Vocabulary, Visual/Spatial.– More than the Hands: Simultaneity!– How signs can be changed: Morphology!– Use of Space around the Signer…

ASL Linguistics II• Discourse Space

– Put discourse entities on “shelves” for later referential use.

– “Agreement” - Pronouns, Possessives, Verbs.

– Don’t interpret locations literally. (Bob to the left of Tim.)

• Three-Dimensional Space

– Space around signer is visually analogous to a real scene.

– Classifier Predicates• Signers describe 3D scenes with their hands.

• Meaningful handshape and 3D representative movement path.

ASL Linguistics III• Traditional Sentences: (No classifier predicates.)

Where does Billy attend college? wh #BILLY IXx GO-TO UNIVERSITY WHERE

• Spatially Complex: (Uses classifier predicates.)

I parked my car next to his cat.POSSx CAT ClassPred-bent-V-{locate cat in space}POSS1s CAR ClassPred-3-{park next to cat}

The truck drove down the windy road.IXx TRUCK ClassPred-3-{drive on windy road} 8

Initial Approaches to ASL MT

Non-statistical Direct and Transfer MT Architectures

Corpora for ASL?

• ASL has no written form; so, there’s no newswires or ready-made sources of text.

• Some groups have attempted to record and annotate video tapes, but the difficulty of creating a useful and consistent manual transcription standard and then performing the transcription makes for very slow work.

• No statistical approaches to ASL MT.

Machine Translation PyramidMT Pyramid Dorr 1998.

• Options in MT design.

• No stats? higher path:– more work– domain size– subtler

divergences handled

Option 1: Direct Translation

• What kind of non-statistical translation possible if all we do is word-level analysis (i.e. morphology, POS & sense tagging) ?

• Word-for-sign dictionary look-up system.

• Probably not sophisticated enough analysis to produce ASL, but could produce SEE.

Option 2: Transfer Translation

• Syntactically analyze English text before crossing over to ASL.– Capture more divergences and handle more

complex phenomena. – Can successfully translate many English

sentences into ASL.

• Some previous work along these lines.– some use deep syntax or simple semantics

Transfer Issues for ASL

• ASL Discourse Model: topics, referents in space.• Representing & Generating Non-Manual Signals.• Computational Model of ASL Phonology

– facilitate creation of an ASL lexicon– define morphological and phonological operations

• Parameterizing ASL Features for Morphology

• Note: If system couldn’t handle a particular input, just fall back on direct translation to produce signing output closer to SEE than fluent ASL.

14

Handling Spatially Complex ASL

Failings of direct and transfer approaches to ASL MT.

But what’s the hard part?

• Previous ASL generation work has ignored spatially complex ASL sentences.– Classifier predicates and spatial verbs– Very common, very communicatively useful.

• Difficult to handle in transfer architecture. (More going on than just syntax with these.)

Translate to a Classifier Predicate

The car drove down the bumpy road past my house.

POSS1s HOUSE ClassPred-C-{locate house}

IXx CAR ClassPred-3-{drive on bumpy road}

• Where’s the house, the road, and the car? How close? Where does the path start/stop? How show path is bumpy, winding, or hilly?

Paralinguistic? Iconic? Spatial?

• Linguists debate whether classifier predicates are:– Paralinguistic visually iconic gestural movements

– Complex non-spatial polymorphemic constructions

– Semantically compositional yet still spatially aware

• Pushing the boundaries of ‘language’… – May involve gradient information, spatial analogy,

scene visualization, and a degree of iconicy.

– Not clear traditional linguistic approaches can capture.

– Still seems linguistic however: many constraints… 18

When the going gets tough…

• …the tough try an interlingua. – Hard to address using morphological, syntactic, and

simple semantic information of the English text.

– Direct or transfer architecture appear insufficient.

• What about an interlingual approach?– Problem: Hard to build interlingua system for unlimited

(or even medium-sized) domain. Lots of overhead!

– Interlingual systems only for limited domains.

Getting by with limited domain?

• Special about ASL: can identify ‘hard’ sentences.– Spatially descriptive text: English spatial verbs

describing locations, orientations, or movements; spatial prepositions or adverbs; concrete or animate entities; other common motifs or situations when classifier predicates are used (detect lexically).

• Use broad-coverage transfer approach for most inputs, and detect when we need to use something more powerful when we have a spatially complex English input sentence.

“Multi-Path” MT?

• Whenever possible,Use simpler easier-to-build MT approach.

• Only when needed,Use more sophisticated resource-intensive.

• We take advantage of the ‘breadth’ of one and the ‘depth’ of the other.

• If we add direct translation (to SEE) to the picture, we actually have three pathways.

“Pyramidal” MT?

Don’t interpret this picture as a set of options anymore…

Now it’s a skeleton for a multi-path MT architecture.

MT Pyramid Dorr 1998.

What is our Interlingua?

• What is the language-neutral representation between the English and ASL when talking about a spatially complex scene?

• Intuitively, the signer has a visualization of the 3D scene which they are discussing.

• So, a spatial representation of reality (or the signer’s imagination/conception of this reality) is serving as the interlingua.

This sounds rather ambitious… How could the computer model spatial reality?

What about Virtual Reality?

• Analyze the English text, construct 3D virtual reality representation of the scene, and use VR as basis for generating the spatially iconic classifier predicate movements.

• But has anyone ever attempted to construct a 3D virtual reality representation of a changing scene as it is described by English sentences?

• Actually, the University of Pennsylvania has.

22

A Useful Technology

Natural Language Command and Control of Virtual Reality Scenes

HMS & NLP Labs: 3D Scene NL-Command

• Have a virtual reality model of characters and objects in a three-dimensional scene.

• Accepts English text input (directions for the characters or objects to follow).

• Produces an animation in which the characters obey the English commands.

• Updates the 3D scene to show changes.

Badler, Bindiganavale, Allbeck, Schuler, Zhao, Lee, Shin, and Palmer. 2000.

Schuler. 2003.

An NL-Controlled 3D Scene

http://hms.upenn.edu/software/PAR/images.html

NL Command and Control

English Text

English Syntax

Filled-In PAR

Analysis

Selecting a PAR Template from the Actionary and Filling-In Slots

Animation Script

Animated 3D Scene

Actionary: PAR Templates

for Entity Motions Hierarchical Planning: handle ambiguities, add more detail…


English Text

English Syntax

Filled-In PAR

Analysis


Animation Script

Animated 3D Scene


for Entity Motions

What’s a PAR?

Hierarchical Planning: handle ambiguities, add more detail…

“Actionary” = Action Dictionary =List of PAR Templates

Parameterized Action Representationparticipants: [ agent: AGENT

objects: OBJECT list ]

semantics: [ motion: {Object, Translate?, Rotate?} path: {Direction, Start, End,

Distance}

termination: CONDITION duration: TIME-LENGTH

manner: MANNER ]

start: TIME

prep conditions: CONDITION boolean-exp

sub-actions: sub-PARs

parent action: PAR

previous action: PAR

next action: PAR

This is a subset of PAR info.http://hms.upenn.edu/software/PAR

SpecifyLocomotion

Arguments

VerbAdjuncts

Planning Operator


English Text

English Syntax

Filled-In PAR

Analysis


Animation Script

Animated 3D Scene


for Entity Motions Hierarchical Planning: handle ambiguities, add more detail…


English Text

English Syntax

Filled-In PAR

Analysis


Animation Script

Animated 3D Scene


for Entity Motions



English Text

English Syntax

Filled-In PAR

Analysis


Animation Script

Animated 3D Scene


for Entity Motions


MT Approach to Classifier Predicates

Using the HMS NL Command and Control Technology

25

Using this technology…


http://hms.upenn.edu/software/PAR/images.html




An NL-Controlled 3D SceneOriginal image from: Simon the Signer (Bangham et al. 2000.)

Signing Character


An NL-Controlled 3D SceneOriginal image from: Simon the Signer (Bangham et al. 2000.)

Signing Character

“Invisible World” Approach• Mini VR scene in front of the signer containing

entities from English text. (They’re invisible.)

• Interpret the English sentences as NL commands. Instantiate PARs which position, move, reorient, and otherwise modify the entities in this world.

• Update VR model.• Use hand to show changes in the invisible scene.• VR acts as intermediary between English & ASL.

Original image:MT Pyramid Dorr 1998.

Interlingual Pathway for ASL

Our MT picture…

We now have an

interlingual

pathway.



The NL-Command Technology



This step harder than seems…

VR Scene Doesn’t Do It All

• Various factors aside from the movement of the scene itself can affect this generation choice: – conventional motifs of expression

• e.g. furniture or items in a room

– restrictions on use of multiple hands simultaneously

– handshape-movement combination constraints• e.g. ‘approaching’ constructions

– discourse or semantic concerns/priorities, etc.

• There’s generation work to be done!

An NL Engineering Solution

• How to create the classifier predicates from VR?– Write rules obeying restrictions that inspect the VR

scene, consider English text semantics, and combine many small units/morphemes to slowly produce or narrow-in on a classifier predicate output.

– Easier approach: Lexicalize classifier predicates as much as possible. Define and specify a big list of classifier predicate templates – their performance and semantics. Fill slots based on info in the VR scene.

• HMS: To define set of possible movement templates, build a PAR “actionary” specifying the animation possibilities.

30

A Second Actionary: For ASL

• The first actionary (list of PAR templates) we saw was used while analyzing the English text. It listed possible types of movements the imaginary entities perform in the virtual reality scene.

• This second actionary would describe the possible movements of the signer’s hands while performing one or more interrelated classifier predicates(& discourse/semantic effects).

Original image from: Simon the Signer (Bangham et al. 2000.)


This step could be hard…


We now have an architecture for the interlingual pathway!

MT Pyramid Dorr 1998.

Direct: Unanalyzable Text

Interlingual:Spatial Text

Transfer: Most Sentences

Multi-Path Pyramidal MT

A Final Consideration

Other motivations for the lexicalized classifier predicate “double actionary”

architecture…

34


Practical engineering motivations for design: Just a hack?Does relying on template actionary limit output too much?

Linguistic Motivations

• “Blended Spaces” Lexicalized Classifier Predicate Model of Scott Liddell (2003).– Double-Actionary design analogous to model

of how humans generate classifier predicates.– This model assumes signers imagine entities

under discussion occupying space before them.– It argues that classifier predicates are stored as

a lexicon of templates that are parameterized on locations/orientations of these spatial entities.





















• Both engineering & linguistic motivations.

Liddell’s Argument for Lexicalization

• Rejects assertion that spatial model not necessary. – Failings of non-spatial polymorphemic CP models.

Unless very many morphemes: under-productive.

• Rejects naïve visually representative/analogous paralinguistic description of classifier predicates. – These models are over-productive, predicting unseen

ASL constructions corresponding to imaginable movements, but model can’t explain these restrictions.

• Parameterized CP lexicon explains restrictions (template not in lexicon) but incorporates spatial productivity of the visually analogous model.

38

Summary

Where we’re at…

• Seen MT approach for ASL classifier predicates.• Proposed “Multi-Path Pyramidal” architecture.• Uses HMS lab virtual reality software.• Design is analogous to Liddell’s recent CP model.

– Reached same design from engineering approach.

– System could serve as test-bed for the model.

• Survey, analysis, design draft, and specification. Implementation not started yet… Suggestions?

Questions?

Is the VR really an interlingua?

• Depends on your definition & how implemented.– Language neutral:

3D coordinates & VR info: not language specific.But ASL PAR selection/filling might use other info.

– Semantic representation: Yes, model for 3D spatial domains.

– Useful for translation:We’ve shown how it can be.

– World knowledge beyond input semantics:Yes, in that it handles spatial/physics matters.

Let’s consider this…

Ontology vs. Domain

• Special property of ASL: easy to identify ‘hard sentences’ requiring interlingua.– Only need to build interlingual resources to cover these

domains (e.g. moving vehicles, furniture layout, etc.).

• But limited domains all similar: discuss 3D location, movements, and dimensions.– So the ontological expressiveness of this interlingua

doesn’t have to be nearly as powerful as most systems.

– Abstract concepts, beliefs/intentions, quantification…

– Not just things – but types of things – are limited.

References Cited

Badler, Bindiganavale, Allbeck, Schuler, Zhao, Lee, Shin, and Palmer. 2000. Parameterized action representations and natural language instructions for dynamic behavior modification of embodied agents. AAAI Spring Symposium.

Bangham, Cox, Lincoln, Marshall. 2000. Signing for the deaf using virtual humans. IEE2000.

Liddell. 2003. Sources of Meaning in ASL Classifier Predicates. In Karen Emmorey (ed.). Perspectives on Classifier Constructions in Sign Languages. Workshop on Classifier Constructions, La Jolla, San Diego, California.

Schuler. 2003. Using model-theoretic semantic interpretation to guide statistical parsing and word recognition in a spoken language interface. Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL’03), Sapporo, Japan.

Documents

Handling Spatially Complex English-to-ASL MT with a Multi-Path Pyramidal Architecture Matt Huenerfauth CLUNCH Presentation November 3, 2003