A Visual Imagination Approach to Cognitive Robotics

Masthead LogoFordham University

DigitalResearch@Fordham

Faculty Publications Robotics and Computer Vision Laboratory

5-2010

A Visual Imagination Approach to CognitiveRoboticsDamian M. LyonsFordham University, [email protected]

Sirhan ChaudhryFordham University

D. Paul BenjaminPace University

Follow this and additional works at: https://fordham.bepress.com/frcv_facultypubs

Part of the Robotics Commons

This Conference Proceeding is brought to you for free and open access by the Robotics and Computer Vision Laboratory atDigitalResearch@Fordham. It has been accepted for inclusion in Faculty Publications by an authorized administrator of [email protected] more information, please contact [email protected].

Recommended CitationLyons, Damian M.; Chaudhry, Sirhan; and Benjamin, D. Paul, "A Visual Imagination Approach to Cognitive Robotics" (2010). FacultyPublications. 8.https://fordham.bepress.com/frcv_facultypubs/8

https://fordham.bepress.com?utm_source=fordham.bepress.com%2Ffrcv_facultypubs%2F8&utm_medium=PDF&utm_campaign=PDFCoverPages

https://fordham.bepress.com/frcv_facultypubs?utm_source=fordham.bepress.com%2Ffrcv_facultypubs%2F8&utm_medium=PDF&utm_campaign=PDFCoverPages

https://fordham.bepress.com/frcv?utm_source=fordham.bepress.com%2Ffrcv_facultypubs%2F8&utm_medium=PDF&utm_campaign=PDFCoverPages

https://fordham.bepress.com/frcv_facultypubs?utm_source=fordham.bepress.com%2Ffrcv_facultypubs%2F8&utm_medium=PDF&utm_campaign=PDFCoverPages

http://network.bepress.com/hgg/discipline/264?utm_source=fordham.bepress.com%2Ffrcv_facultypubs%2F8&utm_medium=PDF&utm_campaign=PDFCoverPages

https://fordham.bepress.com/frcv_facultypubs/8?utm_source=fordham.bepress.com%2Ffrcv_facultypubs%2F8&utm_medium=PDF&utm_campaign=PDFCoverPages

mailto:[email protected]

A Visual Imagination Approach to Cognitive Robotics*

Damian M. LyonsSirhan Chaudhry

Robotics and Computer LaboratoryDept. of Computer & Information Science

Fordham University NY 10458

D. Paul BenjaminSchool of Computer Science

Pace university , NY

Symposium onUnderstanding the Mind and Brain

Tucson Arizona, May 2010

*Supported in part by DOE grant DE-FG02-08CH11542

FordhamRobotics & Computer

Vision

Overview of Talk

Introduction & Motivation

Approach: Visual simulation

Method: Match-mediated difference

Method: View & Object Synchronization

Experimental Results

Summary & Conclusions


Vision

Motivation: “Cognitive Robotics”

Build robot systems capable of reasoning about all the kinds of complex phenomena that occur in everyday, real-world interactions.


Vision

Fordham Urban Search and Rescue Team (FUSAR)


Vision

Application: Reconnaissance, Security, Search and Rescue

Unstable terrain – reason about what may happen if terrain is disturbed

Dynamic terrain events – reason about where to go to avoid damage

Implicit collaboration – reason about how to contribute to ongoing task

Explicit collaboration – reason about when help is needed


Vision

Specific Issue: Model objects with complex behaviors

Wall collapsing - bricks/debris go where?

Object rebounds off one or more surfaces

- ends up where

Unstable surface begins to slip - where does it slide to?

Make a judgment about the ‘precariousness’ of an object


Vision

Specific Issue: Model objects with complex behaviors

Wall collapsing - bricks/debris go where?

Object rebounds off one or more surfaces

- ends up where

Unstable surface begins to slip - where does it slide to?

Make a judgment about the ‘precariousness’ of an object


Vision

Reactive Approach

Behavior-based, reactive approach to visual tracking solution is possible.

But - interposing wall, other agent, or collection of objects complicates this.

Need to model potential behaviors of and interactions with other objects/agents to allow prediction


Vision

Cognitive Approach

[Shanahan 2006] has proposed that

cognitive functions such as anticipationand planning operate through a process of internal simulation of actions and environment.

Craik (via Péter Érdi): Mind constructs

small-scale models of reality to predict events.


Vision

Overview of Talk








Vision

The Minimal Subscene

Itti & Arbib (2005) define the minimal subscene as a middle ground between visual attention and language.

Arguably a better place to start for robots than VISIONS STM structure (because it contains ‘verbs’)


Vision

Arbib’s ‘Schema Theory’ [Arbib 1981,1998]

Perceptual schemas:

Will the environment support (afford) the task

Continuously extract parameters for the task

Motor schemas:

Are the control systems to exploit such parameters

can be coordinated to effect a wide variety of action.


Vision

The Minimal Subscene and ‘Visual Imagination’ module

Minimal SubsceneCurrent motor and perceptual schema, other related m & p

schemas

Fusion of Visual Attention

Internal Simulation

Planning &

Learning

Library ofPerceptual and Motor Schemas

‘Visual Imagination’

Planned activities

Visual‘output’


Vision

Our approach to the Mirror Subsystem

On going work with Benjamin@PACEsince 2008

Use 3D game engine (OGRE) to simulate physics/appearance.

Compare graphical output of 3D simulation with actual video image from robot camera

Acts as an ‘imagination’ sensor


Vision

Comparing Real and Synthetic Video imagery

PROS Potentially fast (image comparisons)

Doesn’t require visual attention to know anything about the simulation

Interface between schemas and simulation grounded in visual semantics

CONS Comparing graphical and visual image is

much harder than comparing two visual images


Vision

Working Example

Predicting and tracking and intercepting a target that undergoesmultiple collisions with its environment.


Vision

The Minimal SubsceneSchema Assemblage

Motor SchemaInterceptRollingObject

Perceptual SchemaScene

Background

Perceptual Schema

Rolling Ball

navigation

target

Arbib’s ‘Schema Theory’ [Arbib 1981,1998]


Vision

The Minimal Subscene &the Mirror System

Motor SchemaInterceptRollingObject

Perceptual SchemaScene

Background

Perceptual Schema

Rolling Ball

navigation

target

Fusion of VisualAttention

3DSimulation

3DRendering

Camera


Vision

Scene Background Perceptual Schema

Camera Simulation

Synchronization

Match Mediated Difference

(MMD)

New, Missing, or Unexpected

Elements

Perceptual SchemaScene Background

PrPs

Synthetic Image

Visual Image

He

Perceptual SchemaNew ‘object’

Fusion of VisualAttention


Vision

Rolling ObjectPerceptual Schema

Camera Simulation

Synchronization

Match Mediated Difference

(MMD)

Perceptual SchemaRolling Object

PrPs

Synthetic Image

Visual Image

Prediction Request

Motion Correction


Vision

Filling theScene Background


Vision

Our Scene BackgroundOgre ‘room’ with floor and walls

Texture map visual image onto surface

Represent robot by simulation camera


Vision

Mirror Aspect of Visual Imagination: Scene Background

Need to keep real camera and simulated view (virtual camera) synchronized, so that

Difference operation between views will only yield

unexpected objects or

unexpected motions of expected objects


Vision

Mirror Aspect of Visual Imagination: objects

Need to keep simulated objects in synchronization with their observed behavior, so that

Difference operation between views will only yield

unexpected objects or

unexpected motions of expected objects


Vision

Overview of Talk








Vision

Comparing Real & Synthetic Images:

The problem

|Is – I’r|

IrIs

I’r = He Ir

|Is – Ir|


Vision

Match-mediated Difference Mask (MMDM)

Pp pepe

pq

' )'(

1)(

1)(

e(p) = | p – m( p ) |

• Place a normalized Gaussian at each point p’ in the set of match points P

Pp

v

p

pp

eSP '

2

'

2)'(

1

||

1

• Define the normalized match quality q(p) to be the inverse of the match error


Vision

MMDM (Cont.)

Pp

v

p

m

pp

eS

pq

PpI

'

2

'

2)'(

)'(

||

1)(


Vision

Match-mediated Difference Image

)(

|)(')(|)(

pI

pIpIpI

m

rsd

Pp

v

p

m

pp

eS

pq

PpI

'

2

'

2)'(

)'(

||

1)(


Vision

Summary of difference images

|Is – Ir| |Is – I’r| )(

|)(')(|)(

pI

pIpIpI

m

rsd

Difference image Warped difference image MMDI


Vision

Object Missing


Vision

New Object


Vision

Overview of Talk








Vision

Synchronization Method

Projection matrix P = K [R | ] where

K is intrinsic matrix, and

[ R | ] is extrinsic matrix with rotation R and a translation .

Synthetic matrix Ps = Ks [ I | 0 ]

Real matrix Pr = Kr [R | ].


Vision

Synchronization method

Relationship between the error homography He produced by the MMD module and camera projection matrices:

(where n is the normal to the image plane and d is the depth of the image plane)

r

T

se Kd

nRKH

1


Vision

Synchronization loop

)()()1( 1

serss KHKgtRtR

Assuming that the translation is small*

We use the following formula to iteratively synchronize

* [Guerrero et al 2005 ] describe an algorithm to calculate both R and


Vision

Overview of Talk








Vision

Three points during synchronization

Three steps (t=5,10,20) during the 20 step synchronization of real and synthetic images. Column (A) real image, (B) Synthetic image with corner points, (C) warped image, (D) MMD mask, and (E) (zero) MMD image.

t=5

t=10

t=20

A B C D E


Vision

Synchronization with non-zero MMDI


Vision

Mirror System:Object Motion

MMDI Yields difference between actual and predicted position of target

Given:

Camera angle is known

Frame rate is known

Ground plane assumption (if we don’t have stereo)

We can calculate the force difference


Vision

Corrective Force

• Real object at pr(t)

• Simulated at ps(t)

• Correction:

• Slow down or Speed up

• Back towards observed track

fsr(t)= k( pr(t) - ps(t) )

• Until fsr(t) <

pr(t)

ps(t)

fsr(t)


Vision

Synchronizing with object

roll

Roll – no correct

Roll –correction


Vision

Synchronizing with object

Bounce – no correction

Bounce – with correctionbounce


Vision

Summary & Future work

Novel Cognitive Architecture for visual imagination to support prediction of complex behavior

Mirror system to synchronize world and objects: relationship to neural mechanism that implements the Mirror Neurons ?

Next steps:

Use bounce prediction

Compare with tracking

Real-time implementation

Integration with mapping and navigation


Vision

Image Pixels mapped Terrain Spatiogramto depth TSG

Landmark representation

• Combines appearance and terrain spatial information• Fast comparison operation• Robust to occlusion• Integrate over multiple views


Vision

The End

Thank You


Vision


Roll Error during Synchronization

gain=0.02,0.05,0.25,0.2

0

0.5

1

1.5

2

2.5

3

3.5

4

1 21 41 61Iteration Step

Ro

ll E

rro

r

g=0.01

g=0.05

g=0.15

g=0.2

Graph of the Roll Error versus Iteration step during synchronization for four values of gain for an initial 3.5 degree error between synthetic and real images


Vision

Roll, Pitch and Yaw error

0

5

10

15

20

25

1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81

Iteration

Ro

ll E

rro

r

-2.5

-2

-1.5

-1

-0.5

0

1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81

Iteration

Pit

ch

Err

or

-2.5

-2

-1.5

-1

-0.5

0

1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81

Iteration

Yaw

Err

or

0

5

10

15

20

25

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52

Iteration

Ro

ll E

rro

r

-2.5

-2

-1.5

-1

-0.5

0

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52

Iteration

Yaw

Err

or

-2.5

-2

-1.5

-1

-0.5

0

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52

Iteration

Pit

ch

Err

or

0

5

10

15

20

25

1 4 7 10 13 16 19 22 25 28 31 34 37

Iteration

Ro

ll E

rro

r

-2.5

-2

-1.5

-1

-0.5

0

1 4 7 10 13 16 19 22 25 28 31 34 37

Iteration

Pit

ch

Err

or

-2.5

-2

-1.5

-1

-0.5

0

1 4 7 10 13 16 19 22 25 28 31 34 37

Iteration

Yaw

Err

or

(A) (B) (C)

Graphs of roll (A), pitch (B) and yaw (C) error during synchronization for gain values (t-b)of 0.05, 0.1, 0.15 with initial error of roll 23, yaw 2.9, and pitch 2.9 degrees


Vision

Previous Work

Integrating simulation with robot progamming or robot design [Albrecht et al 06, Diankov et al 08].

Polybot/Polyscheme [Cassimatis et al. 2004] planning is viewed as sequence of mental ‘simulations’ that include physical effects.

Overlaying simulation graphical output on visual imagery [Bejczy et al 95, Burkert et al 04].

Comparing real and synthetic video [Rushmeier et al 95]

Integrating OGRE & SOAR [Benjamin et al 2006], Match-Mediated Difference operation [Lyons et al 2009].

Documents

A Visual Imagination Approach to Cognitive Robotics