Lovett Thesis Final

7/27/2019 Lovett Thesis Final

1/312


2/312

2

Copyright by Andrew Lovett 2012All Rights Reserved


3/312

3

ABSTRACT

Spatial Routines for Sketches: A Framework for Modeling Spatial Problem-Solving

Andrew Lovett

Spatial problem-solving tasks are often used to evaluate peoples cognitive abilities. For example,

Ravens Progressive Matrices is a popular intelligence test. In it, an individual is shown an array of two-

dimensional images, with one image missing. The individual must compare the images and identify a

pattern of differences between them, in order to solve for the missing image. Performance on tasks such

as Ravens and geometric analogy (A is to B as C is to..?) correlates strongly with performance on

many other ability tasks, in the spatial, verbal, and mathematical domains. Thus, these tasks appear to

depend on core, general-purpose representations and processes. However, it is as yet unclear what those

representations and processes are.

To better understand these tasks, we developed Spatial Routines for Sketches (SRS), a general

framework for modeling spatial problem-solving. SRS is based on a set of psychological claims about

how people perform spatial problem-solving: 1) When possible, people use qualitative representations

describing features such as relative position or orientation, rather than exact numerical values. 2) Spatial

representations are hierarchical. A given image might be represented as object groups, individual objects,

or the parts within each object. 3) Qualitative spatial representations can be compared via structure-

mapping. Structure-mapping involves aligning the relational structure in two representations to find the

corresponding elements.

Three task models were built within the SRS framework : geometric analogy, Ravens Progressive

Matrices, and the oddity task, in which one sees a set of images and picks the one that is different. The

three task models use identical representations and similar processes. Thus, they allow us to test the

generality of the psychological claims, as well as the representations and processes that implement these

claims.


4/312


5/312


6/312

6

TABLE OF CONTENTS

1. Introduction ............................................................................................................................................16

1.1 Motivation ......................................................................................................................................19

1.2 Background ....................................................................................................................................21

1.3 Claims .............................................................................................................................................29

1.4 Contributions ..................................................................................................................................39

1.5 Evaluation .......................................................................................................................................40

1.6 Outline of the Thesis ......................................................................................................................41

2. Background ............................................................................................................................................42

2.1 Hybrid Representations ..................................................................................................................43

2.2 Hierarchical Representations ..........................................................................................................55

2.3 Structural Comparison ....................................................................................................................61

2.4 Mental Rotation: An Example Domain ..........................................................................................62

2.3 Strategic Comparison .....................................................................................................................68

3. Modeling Perception and Comparison ...................................................................................................72

3.1 Existing Models ..............................................................................................................................73

3.2 Perceptual Sketchpad......................................................................................................................77

3.3 Image Comparison .........................................................................................................................91

3.4 Shape Comparison ..........................................................................................................................97

3.5 Perceptual Reorganization ............................................................................................................109

4. Spatial Routines for Sketches ..............................................................................................................112

4.1 Operation Categories ....................................................................................................................114

4.2 Spatial Operations ........................................................................................................................118

4.3 Other Operations ..........................................................................................................................153


7/312

7

4.4 Spatial Routine Language .............................................................................................................154

4.5 Gathering Data .............................................................................................................................166

5. Simulations ..........................................................................................................................................169

5.1 Representation ..............................................................................................................................170

5.2 Task Comparison ..........................................................................................................................173

5.3 Parameters and Sensitivity Analysis ............................................................................................174

5.4 Analyses .......................................................................................................................................175

6. Geometric Analogy ..............................................................................................................................177

6.1 Background ..................................................................................................................................178

6.2 Model............................................................................................................................................181

6.3 Model Predictions .........................................................................................................................189

6.4 Behavioral Experiment .................................................................................................................190

6.5 Simulation ....................................................................................................................................200

6.6 Related Work ................................................................................................................................205

6.7 Conclusion ....................................................................................................................................207

7. Ravens Progressive Matrices ..............................................................................................................209

7.1 Background ..................................................................................................................................210

7.2 Model............................................................................................................................................215

7.3 Behavioral Experiment .................................................................................................................225

7.4 Simulation ....................................................................................................................................228

7.5 Discussion ....................................................................................................................................235

7.6 Related Work ................................................................................................................................238

8. The Oddity Task ..................................................................................................................................241

8.1 Background ..................................................................................................................................242

8.2 Model............................................................................................................................................243


8/312


9/312

9

LIST OF FIGURES

Figure 1.1: Geometric analogy problem .....................................................................................................16

Figure 1.2: Three spatial problem-solving tasks .........................................................................................18

Figure 1.3: An oddity task problem involving groups of objects ...............................................................32

Figure 1.4: Two related geometric analogy problems, along with average reaction times and overallaccuracies for human participants ...............................................................................................................35

Figure 2.1: Two oddity task problems from Dehaene et al. (2006). Pick the image that doesnt belong ... 42

Figure 2.2: Stimuli from (A) Navon (1977, Experiment 4), and (B) Love, Rouder, and Wisniewski (1999)

(B) ...............................................................................................................................................................58

Figure 2.3: Two images that one might compare ........................................................................................60

Figure 2.4: Mental rotation stimuli from (A) Shepard and Metzler (1971), and (B) Cooper and Shepard(1973) ..........................................................................................................................................................62

Figure 2.5: Reaction time results from Biederman and Bar (1999, Experiment 2) ...................................67

Figure 3.1: Example stimuli from Love, Rouder, and Wisniewski (1999) .................................................72

Figure 3.2: The CogSketch sketch understanding system ..........................................................................74

Figure 3.3: Sketch of a house ......................................................................................................................78

Figure 3.4: Examples of texture patches .....................................................................................................79

Figure 3.5: Examples of space objects ........................................................................................................79

Figure 3.6: Grouping examples ...................................................................................................................80

Figure 3.7: Examples of parallel edges, parallel objects, and parallel groups ............................................82

Figure 3.8: Example representation with all expressions about Edge-4 in a parallelogram .......................85

Figure 3.9: A rotation between two triangles ..............................................................................................85

Figure 3.10: Two images that might be compared, with the objects labeled ..............................................92


10/312

10

Figure 3.11: Two ambiguous mappings ......................................................................................................94

Figure 3.12: Object pairs whose textures would be considered identical ...................................................95

Figure 3.13: Shape change examples ..........................................................................................................96

Figure 3.14: Shape rotation example ..........................................................................................................98

Figure 3.15: A) Spatial reflections. B) Shape reflections .........................................................................101

Figure 3.16: Examples of lengthening ......................................................................................................104

Figure 3.17: Example of part lengthening ................................................................................................104

Figure 3.18: Examples of part addition .....................................................................................................105

Figure 3.19: Example of a subshape deformation .....................................................................................105

Figure 3.20: Corresponding edges for a symmetry mapping ....................................................................106

Figure 3.21: Example of a scaled group ...................................................................................................107

Figure 3.22: Complex reorganization facilitates this comparison ............................................................111

Figure 4.1: An ambiguous image ..............................................................................................................113

Figure 4.2:A: Oddity task problem from (Dahaene et al., 2006). B: Ravens Matrix -type problem .......115

Figure 4.3: Geometric analogy problems ..................................................................................................117

Figure 4.4: A geometric analogy problem, as it looks in CogSketch .......................................................120

Figure 4.5:A Ravens Progressive Matrices A problem (not from the actual test) ...............................120

Figure 4.6: A row of images for difference-finding ..................................................................................129

Figure 4.7:Ravens Progressive Matrix problems requiring a literal pattern of variance ........................131

Figure 4.8: A row of images with first-to-last matches ............................................................................135

Figure 4.9: Oddity task problems with orientation differences .................................................................136

Figure 4.10: A: A geometric analogy problem. B: The patterns of variance ............................................137


11/312

11

Figure 4.11: Examples of Infer-Shape with reflections ............................................................................140

Figure 4.12: Examples of Infer-Shape with deformations ........................................................................141

Figure 4.13: Examples of Infer-Shapes with texture transformations ......................................................141

Figure 4.14: A geometric analogy problem ..............................................................................................142

Figure 4.15: A geometric analogy problem with an added object ............................................................143

Figure 4.16:Four 3x3 Ravens Matrix prob lems ......................................................................................145

Figure 4.17: A geometric analogy problem with an unclear rotation .......................................................148

Figure 4.18:A Ravens Matrix problem requiring complex perceptual reorganization ...........................149

Figure 4.19:A Ravens Matrix problem requiring texture detection ........................................................150

Figure 4.20: A Bongard problem from (Bongard, 1970) ..........................................................................163

Figure 4.21: The Routine Inspector on a geometric analogy problem ......................................................167

Figure 5.1: A: Geometric analogy problem (Evans, 1968). B: Ravens Matrix problem. C: Oddity task problem (Dehaene et al., 2006) .................................................................................................................169

Figure 5.2: Each internal edge of a textured object was added as a separate object .................................172

Figure 6.1: A geometric analogy problem ................................................................................................178

Figure 6.2: Image pairs requiring perceptual reorganization ....................................................................182

Figure 6.3: Strategy for finding differences between two images (see Appendix B for its implementationin the Spatial Routine language) ...............................................................................................................184

Figure 6.4: Geometric analogy problems requiring second-order comparison .........................................187

Figure 6.5: The answer chosen depends on whether one prefers canonical reflections or rotations ........189Figure 6.6: Reflecting the B shape will produce an identical B shape, as in Figure 6.4B .................190

Figure 6.7: The dot to be removed is in different locations in A and C ....................................................190

Figure 6.8: Problems 1-3 (times are seconds required for human participants to pick an answer; values below answers are the percentage of participants who picked each answer) ............................................194


12/312

12

Figure 6.9: Problems 4-6 ...........................................................................................................................195

Figure 6.10: Problems 7-9 .........................................................................................................................196

Figure 6.11: Problems 10-12 .....................................................................................................................197

Figure 6.12: Problems 13-15 .....................................................................................................................198

Figure 6.13: Problems 16-18 .....................................................................................................................199

Figure 6.14: Problems 19-20 .....................................................................................................................200

Figure 7.1: A matrix problem ...................................................................................................................209

Figure 7.2: Example problems for various Carpenter et al. (1990) rules ..................................................211

Figure 7.3: This matrix problem requires complex perceptual reorganization to solve............................214

Figure 7.4: Two 1x1 matrix problems ......................................................................................................217

Figure 7.5: Solution strategies for the above problems .............................................................................217

Figure 7.6: Problems where computing the pattern of variance may be difficult .....................................219

Figure 7.7: Subroutine for finding differences in a row (see Appendix C for its implementation in theSpatial Routine language). Continued in Figure 7.8 ................................................................................221

Figure 7.8: Subroutine for finding differences in a row. Continued from Figure 7.7 ..............................222

Figure 7.9: Different ways two textures might overlap ............................................................................230

Figure 8.1: Oddity task problem from Dehaene et al. (2006). The image without parallel lines is the oddone out .......................................................................................................................................................242

Figure 8.2: C and D can only be solved by considering edges .................................................................244

Figure 8.3: Certain information can be filtered out...................................................................................245Figure 8.4: B and C require a different similarity measure .......................................................................246

Figure 8.5: One of the six problems the model failed to solve. Average human performance: 68%(American adults), 86% (Munduruk) ......................................................................................................250

Figure 8.6: Problems with closed and open shapes...................................................................................251


13/312

13

Figure 8.7: Problems relying on (A) shape transformation and (B) shape symmetry ..............................251

Figure 8.8: Problems requiring shape comparison between images .........................................................256

Figure 8.9: Problems involving quantitative features ...............................................................................257

Figure 8.10: Problem in which the quantitative difference is particularly salient ....................................258

Figure 9.1: Two image pairs. It is harder to align the same-shaped objects in B than in A.....................260

Figure 9.2: Can you spot the difference between each image pair? ..........................................................261

Figure 9.3: Can you spot the difference between the more complex image pairs? ...................................261

Figure 9.4: Object-level differences (A, B) may be more salient than edge-level differences (C) ...........262

Figure 9.5: Problems involving shape rotations ........................................................................................264

Figure 9.6: An oddity task problem (A) and a Ravens matrix problem (B) with identi cal images .........264

Figure 9.7: This geometric analogy problem requires quantitative information to solve .........................265

Figure 9.8: This geometric analogy problem presents a novel challenge .................................................265

Figure A.1: Two images that might be compared .....................................................................................289


14/312

14

LIST OF TABLES

Table 3.1: Orientation-invariant qualitative vocabulary for edges ............................................................84

Table 3.2: Additional orientation-specific vocabulary for edges ................................................................87

Table 3.3: Object-level qualitative vocabulary. Terms marked with anO are orientation-specific .........88

Table 3.4: Candidate inferences between images 10A and 10B .................................................................92

Table 3.5: Types of shape comparisons ...................................................................................................102

Table 4.1: Pattern of variance for the first two images in Figure 4.6 ........................................................129

Table 4.2: Generalized forms for predicates in difference representations ...............................................138

Table 4.3: A template and examples of operation calls ............................................................................155

Table 4.4: A template and examples of control structures ........................................................................160

Table 4.5: A spatial routine for solving Bongard problems ......................................................................164

Table 4.6: Pattern of variance for Figure 4.20 ..........................................................................................165

Table 6.1: Linear model for human reaction times on geometric analogy ...............................................204

Table 7.1: Linear regression for accuracy of Northwestern students ........................................................233

Table 7.2: Second linear model for accuracy of Northwestern students ...................................................234

Table 7.3: Third linear model for accuracy of Northwestern students......................................................235

Table 7.4: Linear model for reaction times of Northwestern students (in seconds) .................................235

Table 8.1: Accuracy of the model and each participant group on the 45 oddity task problems ...............248

Table 8.2: Correlations in accuracy on each of the 45 problems (Pearsons r) .........................................249

Table 8.3: Rankings of the 6 problems the model failed to solve (1 = easiest, 45 = hardest) ...................249

Table 8.4:Linear models for each groups accuracy ................................................................................253

Table 8.5:Linear models for each groups accuracy, with Elems2 ..........................................................253


15/312

15

Table 8.6: Linear models for each groups reaction times (in s) ..............................................................254


16/312

16

1. Introduction

Spatial problem-solving tasks are a popular toolfor evaluating peoples cognitive abilities. For

example, in geometric analogy (Figure 1.1), an individualis shown an array of images and asked A is to

B as C is to? Like all the tasks I am studying, geometric analogy requires: 1) building up

representations of two-dimensional images; 2) comparing those representations; and 3) identifying a

pattern across them. Thus, these tasks depend critically onones ability to encode spatial relations within

an image, compare images, and abstract out higher-order relations between then based on what is

common or different.

Figure 1.1 Geometric Analogy problem.

In the past, spatial problem-solving has been used to evaluate various abilities, from geometric

knowledge (Dehaene et al., 2006) to general intelligence (Raven, Raven, & Court, 1998). However, there

is disagreement about what exactly a particular task evaluates (e.g., geometric analogy: Sternberg, 1977;

Mulholland, Pellegrino, & Glaser, 1980; Raven's Progressive Matrices: Carpenter, Just, & Shell, 1990;

Primi, 2001). I believe we need a more concrete understanding of the representations and processes

people use to solve these tasks. Applied to a single task, this might illuminate the abilities that separate

one persons performance from anothers. Applied across tasks, it should help explain how people reason

about space and spatial relations.


17/312

17

For my thesis, I have constructed Spatial Routines for Sketches (SRS), a general framework for

building, evaluating, and comparing spatial problem-solving task models. Bytask model , I mean an end-

to-end model of human performance on a task, which begins with visual input, generates a representation,

reasons over the representation, and chooses an output behavior. While the SRS models may vary in their

specific strategies, they are all built upon three core hypotheses:

1) When possible, people usequalitative, or categorical, representations to reason about space (e.g.,

Biederman, 1987; Kosslyn et al., 1989; Forbus, Nielsen, & Faltings, 1991). In particular, they encode the

qualitative spatial relations between elements in a visual scene.

2) Qualitative, structural representations of space are compared via structure-mapping (Gentner,

1983). According to structure-mapping theory, people compare two cases by aligning their common

relational structure, thereby highlighting commonalities and differences across the cases.

3) Spatial problem-solving requires flexibly moving between different levels in a hierarchy of spatial

representations (Palmer, 1977) until a suitable level is found. A level of representation is suitable if,

when the representations are compared, a pattern emerges which can be used to solve the problem. For

example, Figure 1.1 involves the spatial relations between the objects in each image. However, other

problems might require focusing on the edges in a single object, or backing out and considering groups of

objects.

I have two primary goals in modeling spatial problem-solving. The first is to evaluate whether the

above hypotheses are sufficient for explaining human performance. A sufficient task model should meet

two criteria: 1) The model should perform the task with a high degree of accuracy, making no more errors

than a typical human would. 2) When the model does fail, its error patterns should match human error

patterns, i.e., problems that are hard for the model should also be hard for people.


18/312


19/312


20/312

20

occupations require that individuals be skilled at mentally manipulating representations of images. For

example, when a surgeon is operating, they may be required to mentally construct a three-dimensional

structure from a two-dimensional image.

Importantly, real-life spatial visualization ability can be evaluated by abstract problems like paper-

folding. Several studies have found correlations between performance on abstract tasks and real-world

ability in surgeons (see Hegarty et al., 2007, for a review). Shea, Lubinski, & Benbow (2001) found that

performance of 13-year-olds on spatial visualization tasks predicted what they would study in college and

what eventual job they would get, even controlling for verbal and mathematical ability. Individuals with

higher spatial ability were more likely to be scientists and engineers. This analysis was restricted to the

top 1% of 13-year-olds, but ongoing research (Hedges & Chung, in preparation) suggests the findings

generalize to the rest of the population.

Thus, it appears that abstract, spatial problem-solving tasks tap into a spatial visualization ability that

is useful in many advanced disciplines. If we can understand how people perform spatial transformations

in these tasks, we will be better prepared to educate students in spatial skills, and hopefully enhance their

ability to perform in those disciplines.

General Problem-Solving Ability

Many problem-solving tasks in the spatial, verbal, and mathematical domains have been used to

evaluate peoples abilit ies. In the past, correlations in performance across these tasks caused Spearman

(1923; 1927) to put forward the notion of g , a single, general intelligence measure which could predict an

individuals ability across a large set of task domains. While this thesis is not directly concerned with g,

Spearmans work suggests that some mental abilities are utilized across a wide range of tasks. If spatial

problem-solving tasks tap in those abilities, then our models can give us insights into peoples general

reasoning processes.


21/312

21

In fact, there is strong evidence that one of the tasks, Ravens Progressive Matrices (RPM), does tap

into general abilities. RPM was originally designed to evaluate one major component of g , eduction , or

the ability to identify meaningful patterns in confusing data (Raven, Raven, & Court, 1998). Since the

tests creation, several studies have shown it to be one of the highest single-test correlates with g (e.g.,

Burke & Bingham, 1969; Zagar, Arbit, & Friedland, 1980; see Raven, Raven, & Court, 2000b, for a

review), meaning an individuals performance on RPM predicts their performance on many other

intelligence tests. In a multi-dimensional scaling analysis of ability tests, Snow and colleagues (Snow,

Kyllonen, & Marshalek, 1984; Snow & Lohman, 1989) found that RPM lies in the middle of the ability

space. That is, while most ability tests cluster with other tests in the same domain, e.g., verbal,

mathematic, and spatial tests, RPM correlates highly with the most abstract ability tests from each

domain. This suggests that although RPM is a visuo-spatial task, it evaluates several domain-general

mental abilities. My hope, in modeling this task, is to gain a greater understanding of what those abilities

are.

1.2 Background

Two of the spatial problem-solving tasks, geometric analogy and the Ravens Progressive Matrices,

have been studied extensively in the past, while research on the oddity task is more limited. In the

following sections I summarize the research, focusing on two questions: 1) What are the processes people

use to perform the task? 2) What factors contribute to the relative difficulty of each problem?

Geometric Analogy

Processes

Geometric analogy (Figure 1.1, Figure 1.2A) has been studied and modeled by many researchers over

the last 50 years. Evans ANALOGY (1968) was the first computational model of the task. A

sophisticated program for its time, ANALOGY automatically encoded representations of the images in a


22/312

22

problem and could even identify spatial transformations between shapes, such as rotations. Once

representations had been constructed or hand-coded, it solved a problem via what has been termed an

infer-infer-compare strategy (Mulholland, Pellegrino, & Glaser, 1980):

1) Infer a mapping between images A and B, describing what changed between them.

2) For each possible answer n, infer a mapping between images C andn.

3) Compare the A/B mapping to the C/n mapping for each possible answer, and choose the answer for

which this comparison returns the closest match.

ANALOGYs mapp ing processes are unlike human mapping processes in that they perform an

exhaustive search for the best possible mapping, rather than using heuristics (Gentner, 1983). However,

the infer-infer-compare model remains a reasonable strategy for geometric analogy, and it is one we will

return to.

Sternberg (1977) argued that people actually solve problems like geometric analogy via aninfer-map-

apply strategy (Mulholland, Pellegrino, & Glaser, 1980):

1) Infer a mapping between images A and B, describing what changed between them.

2) Compute a mapping between images A and C, identifying their corresponding elements.

3) Based on this mapping, apply the A/B differences to C to compute D, a representation of the

image that would best complete the analogy. This D r epresentation can then be compared to each

possible answer to see which one best matches it.

Sternberg produced evidence that people utilized infer-map-apply on his problem set, while

Mulholland, Pellegrino, & Glaser (1980) argued that people utilized infer-infer-compare ontheir problem

set. Later, researchers suggested that people adjust their strategy depending on factors such as the

problems complexity and the relatedness of the items being compared (Bethell-Fox, Lohman, & Snow,

1984; see also Grudin, 1980, on verbal analogies).

These early researchers were focused on the particular strategy, or the set of cognitive components,

that a person would use to solve a geometric analogy. However, they were less clear on what specific


23/312

23

mechanisms might be used, for example, to compare images A and B. Only Evans (1968) had a complete

computational model. While computational models of geometric analogy have become more popular in

recent years (e.g., Bohan & ODonogh ue, 2000; Schwering et al., 2009), I believe no other researchers

have built an end-to-end computational model that could be compared against human performance.

Factors Contributing to Difficulty

The above researchers also analyzed the factors contributing toeach problems relative difficulty. In

generating their stimuli, Mulholland, Pellegrino, & Glaser (1980) independently varied the number of

elements in each image and the number of transformations between images, e.g., the number of

differences between images A and B. They found that as either number of elements or number of

transformations increased, the reaction time increased, while the accuracy decreased. Importantly, for

reaction time the effects of elements and transformations were not simply additive; there was a

particularly great cost when both number of elementsand number of transformations increased. The

researchers believed this was due to working memory load. As the number of elements or

transformations increased, the working memory load also increased. For particularly large numbers of

both, working memory load exceeded peoples capacity, forcing them to change their problem-solving

strategy, and resulting in substantial reaction time costs.

Bethell-Fox, Lohman, & Snow (1984) independently varied several factors in their geometric analogy

problems: number of elements in the images, number of transformations between images, figural vs.

spatial transformations, number of possible answers to choose from, and similarity of distractors to the

correct answer. They found that all these factors correlated with reaction times, while all factors except

number of transformations correlated with error rates. They also found some interesting interactions between the factors for example, those problems with the most imagesand the most possible answers

were significantly harder than other problems. Again, this may have been because the large memory load

required a strategy shift.


24/312


25/312

25

3) Distribution of three values: There are three different elements in the three images of the row.

Every row must contain each of those three elements, but their order varies.

4) Figure addition or subtraction: If the first image contains element X and the second image contains

element Y, the third image must contain elements Xand Y. Subtraction is similar.

5) Distribution of two values: There is an element that is present in two of the images but not the

third.

For example, to solve Problem 2B, one would study the images in the first row, note that the groups

of three squares correspond to each other, and recognize that the relation between these groups can best

be described by aquantitative pairwise progression rule: the squares are becoming smaller. One would

then select the answer that best fit this rule in the bottom row.

Carpenter, Just, and Shells (1990) first model, FAIRRAVEN, could perform APM about as well as

the average participant from their subject pool. The second model, BETTERAVEN, performed at the

level of the best participants. BETTERAVEN performed better due to a couple key changes: 1)

BETTERAVEN could identify cases where there was an element in only two of the three images in a

row, meaning it could use thedistribution of two values rule; 2) BETTERAVEN had better goal

management. It identified candidate rules one at a time, and it backtracked when a candidate rule proved

ineffective for solving a problem.

Based on the difference between the models, the experimenters suggested that the ability to manage

goals is a key factor in solving the hardest problems. They saw this as being linked to working memory: a

complex problem requires that one manage a hierarchy of goals, and keeping them all in memory can be

quite difficult.

In their analysis, the researchers noticed another interesting difference between high- and low-

performers on the task. In some cases, there was more than one rule that might be applied to solve a

problem. For example, suppose the three corresponding elements were an arrow pointed up, an arrow

pointed right, and an arrow pointed down. This set of elements could be seen as either a quantitative


26/312

26

pairwise progression, in which an arrow shape gradually rotates, or a distribution of three, with three

different shapes. While either of these rules is sufficient, the first rule is better, as it is more compact.

That is, one doesnt have to remember all three shapes individually. On problems such as these, verbal

protocols showed that higher performers used the quantitative pairwise progression rule more often than

lower performers.

Carpenter et al. suggested the above result occurred simply because higher performers were more

consistent about looking for the simpler rules before they looked for the more complex rules. However, I

think it is likely that all participants looked for the simpler rules first. I interpret these results as

suggesting, rather, that higher performers are better at identifying more abstract relationships between

elements. Whereas the lower performers saw the arrows as different shapes, the higher performers were

better able to compare the arrows, perform a spatial transformation, and identify a rotation between the

arrows shapes. Identifying more abstract relations on problems like these would reduce working

memory load and generally aid in solving the problems.

Carpenter et al. failed to address a couple important components of human processing with their

models. Firstly, the models did not generate representations from the images; all representations were

hand-coded based on participant descriptions. The experimenters suggested the visual encoding process

was irrelevant, given how well performance on the task correlates with other, non-visual tasks. However,

this assumes visual encoding ability does not generalize to encoding ability in other modalities. I believe

RPM taps into a general ability to encode a stimulus at the appropriate level of abstraction. Indeed, the

authors themselves suggest that an ability to decrease working memory load by using more abstraction

representations might explain the correlation between RPM and the Towers of Hanoi problem, a very

different task

Secondly, the FAIRAVEN and BETTERAVEN models did not learn the five rules described above.

Rather, abstract forms of all the rules were hard-coded into the models. Thus, the models fail to explain


27/312

27

how individuals begin with basic comparisons of pairs of images and develop, over time, an

understanding of a complex rule that holds across a row of images.


Two groups of researchers (Vodegel-Matzen, van der Molen, & Dudink, 1994; Embretson, 1998)

have evaluated Carpenter, Just, and Shells (1990) analysis by creating their own, experimental matrix

items in which they varied the number and type of Carpenter rules. Both groups found that a rule

complexity measure, based on the number and difficulty of rules, correlated highly with problem

difficulty: problems with more rules, and with more difficult rules, are harder to solve. This is likely

because such problems both put a greater load on working memory and require more sophisticated

problem-solving techniques. I believe a computational model which makes those problem-solving

techniques explicit is needed to gain a better grasp on what exactly makes more complex problems

harder.

Primi (2001) designed new matrix items in which he independently varied the number of elements in

each image, the number of rules describing a row of images, the complexity of rules, and perceptual

organization . Perceptual organization referred to how difficult it was to identify the corresponding

elements in a row of images. In hislow perceptual organization problems, the correspondence-finding

component was made difficult through misleading cues that encouraged test-takers to put the wrong

elements into correspondence. Perhaps unsurprisingly, he found that difficulty of correspondence-finding

was the greatest predictor of problem difficulty.

The official RPM problems, unlike the problems in Primis test se t, were not explicitly designed to

make correspondence-finding difficult. Nonetheless, there are some problems which invite inappropriatecorrespondences (Carpenter, Just, & Shell, 1990). In such cases, I suspect the most important skill is an

ability to backtrack and look for alternate mappings when a set of correspondences prove insufficient for

solving a problem. Thus, dealing with incorrect correspondences, like dealing with excessive memory

load, may require an ability to evaluate and dynamically modify problem-solving strategies.


28/312

28

Oddity Task

Processes

The oddity task (Figure 1.2B) is a set of 45 problems which Dehaene et al. (2006) constructed to

evaluate peoples geometric knowledge. The experimenters hypothesized that the problems tap intocore

knowledge of geometry (Spelke & Kinzler, 2007), an innate, universal cognitive module. This module

would presumably understand key geometric concepts like perpendicular lines, and would therefore be

used to solve problems like Figure 1.2B.

To test the universality of geometric knowledge, Dehaene et al. gave their 45 problems to participants

of varying ages from two cultural groups: North Americans and the Munduruk, a South American

indigenous group. They found that the Munduruk performed above chance on nearly all the problems,

despite their lack of formal schooling in geometry and mathematics. Furthermore, there was a significant

correlation between the American and Munduruk error patterns. The experimenters took this as

evidence that the two groups were working with a shared, universal set of core geometric knowledge.

As this work is relatively new, no other researchers have explored the processes people might utilize

in performing the task. However, I have suggested people likely use processes similar to those used in the

previous tasks (Lovett, & Forbus, 2011). Whereas geometric analogy involves comparing two images

and noticing their differences, the oddity task involves comparing images and noticing their

commonalities. Later, I will argue that both of these comparisons can best be modeled as structure-

mapping processes (Gentner, 1983).


The oddity task problems (Dehaene et al., 2006) were designed to evaluate understanding of two-

dimensional spatial concepts. As such, they vary greatly in the type and complexity of the spatial

concepts they test for. Several of the more difficult problems involve rotations or reflections between


29/312

29

shapes. Thus, as with geometric analogy, people may have more difficulty with problems that require

spatial visualization.

Importantly, these problems do not merely rely on ones abi lity to mentally rotate shapes. They also

require that one notice that there is a possible shape transformation, and then put in the effort to compute

the transformation. Thus, there may be a motivational element. Recall that on the RPM, higher-

performing participants were more likely to notice transformations like a rotation of an arrow shape, even

when the transformation wasnt strictly necessary for solv ing the problem. I believe that across all three

tasks, recognition of spatial transformations depends on: 1) spatial visualization ability, and 2) a general

intellectual interest in comparing elements and looking for more abstraction relationships between them.

1.3 Claims

My thesis rests on two broad sets of claims. The first set is about the representations and processes

we can use to model human spatial problem-solving. The second set is about what we can learn from

these models, in terms of the factors that contribute toa problems difficulty and the processes that

explain variation in human performance.

Modeling Spatial Problem-Solving

The studies described above tell us a little about the processes, and less about the representations, that

people utilize during spatial problem-solving. There are several open questions that must be resolved to

model the tasks from end to end. I propose the following hypotheses about human representations and

processes:

1) When possible, people utilizequalitative representations of the spatial relations between elements

in an image (e.g., Biederman, 1987; Kosslyn et al., 1989; Forbus, Nielsen, & Faltings, 1991) when

reasoning about space.


30/312

30

2) These qualitative representations are hierarchical (Palmer, 1977). That is, one can reason about

relations between objects in an image, or one can focus in and reason about relations between parts of an

individual object, or one can zoom out and reason about relations between groups of objects.

3) Qualitative, relational representations are compared via structure-mapping (Gentner, 1983, 1989), a

domain-general process of structural alignment. Structure-mapping can identify commonalities and

differences in two representations, or it can determine their similarity.

4) The computation of spatial transformations, such as rotations between shapes, is accomplished via

an interaction between hierarchical representations and structure-mapping. To compute a transformation

between two shapes, an individual uses structure-mapping over representations of the shapes parts to

identify corresponding parts. Then, a spatial transformation is applied to put the corresponding parts into

alignment.

5) Spatial problem-solving requires control processes which can look over a problem, apply a

strategy where a strategy consists primarily of a set of comparisons via structure-mapping and, when a

strategy fails, backtrack and attempt a new strategy.

In the following subsections, I briefly justify each hypothesis and then outline how it can be modeled.

1) Qualitative Representations of Spatial Relations

There is good evidence that people are sensitive to the qualitative, or categorical, relations between

objects in a visual scene. For example, parallel lines (Abravanel, 1977) and concave corners between

edges of a shape (Bhatt et al., 2006) are particularly salient to people, suggesting that we make a

qualitative distinction between parallel and non-parallel, or between concave and convex. In some cases,

people go out of their way to identify a qualitative relation between objects: when people are asked tomemorize the location of a dot in a circle, there is evidence they mentally divide the circle into four

quadrants and then qualitatively encode which of the four quadrants the dot was located in (Huttenlocher

et al., 1991).


31/312


32/312


33/312

33

correspondences between elements in the two cases; 2) a structural evaluation score , a measure of the

similarity of the cases based on the systematicity of the mapping; and 3)candidate inferences , inferences

about the target based on elements in the base that failed to map to the target. Candidate inferences can

be used to identify differences between the base and the target.

While structure-mapping was originally proposed to explain abstract analogies, there has been

mounting evidence that it can also explain peoples concrete comparisons of visual sti muli (Markman &

Gentner, 1996; Lovett et al., 2009a; Sagi, Gentner, & Lovett, in press). I believe it may play a ubiquitous

role in spatial problem-solving because of its usefulness at various stages in the problem-solving process.

For example, the infer-infer-compare strategy for geometric analogy, described above, relies on both

computing the differences between images and comparing two sets of differences. SME can do both

these things, as the sets of differences computed by SME can themselves be compared via SME in a

second-order comparison (Lovett et al., 2009b).

4) Structure-Mapping in Spatial Transformations

One of my goals is to better understand how people perform spatial visualization, particularly how

they compute transformations between shapes. Work onmental rotation (Shepard & Metzler, 1971;

Shepard & Cooper, 1982), has shown that the time required to determine that one shape is a rotation of

another is proportional to the degrees of rotation between them. This suggests that people perform this

task by mentally rotating their representation of one shape to align it with the other. If so, people cannot

be working solely with a qualitative shape representation, as described above. Qualitative representations

would not include the absolute orientations of shape parts, so there would be no need to transform them in

an analog fashion.I believe mental rotation is performed on a quantitative, orientation-specific shape representation.

While the exact form of this representation is unclear, it can be modeled simply as the set of edges in a

shape, along with each edges quantitative orientation , length, and location. However, a mystery remains.

Typically, the time to perform a mental rotation is proportional to the degrees of rotation along the


34/312

34

shortest possible rotation between the two shapes. How do people know which way to rotate one shape to

quickly align it with the other?

To answer this question, I have proposed a three-stage model of mental rotation:

1) Compare qualitative edge-level representations via structure-mapping. Identify the corresponding

edges.

2) Take a single pair of corresponding edges and calculate the shortest possible rotation between

them.

3) Apply this rotation to the full set of quantitative edges in the first shape. Evaluate whether this

aligns those edges with the edges in the second shape.

This approach can identify other transformations, such as reflections, as well. I describe the approach

in detail in Chapter 3.

5) Control Processes

I have argued that structure-mapping can account for much of the processing in spatial problem-

solving. However, there must also be a control process which chooses a strategy, oversees the application

of that strategy, and selects a new strategy if the original fails to produce a solution. New strategies may

be necessary when a particularly complex problem overloads the working memory (Mulholland,

Pellegrino, & Glaser, 1980; Bethell-Fox, Lohman, & Snow, 1984), or when a tricky problem causes one

to identify the wrong corresponding elements (Primi, 2001).


35/312

35

A) B)

Average RT: 6.7 s Average RT: 26.7 s

Accuracy: 100% Accuracy: 56%

Figure 1.4 Two related geometric analogy problems, along with average reaction times and

overall accuracies for human participants.

For example, consider the geometric analogy problems in Figure 1.4. While these problems are

nearly identical, the second one is significantly harder (Lovett et al., 2009b) because the obvious mapping

between images A and B, in which the large triangle in A corresponds to the large triangle in B, is

incorrect. Thus, the problem requires backtracking and identifying an alternate, less obvious mapping, in

which the small triangle in A corresponds to the big triangle in B. Clearly, this type of backtracking is

difficult for people.I have built control processes directly into Spatial Routines for Sketches. A spatial routine can

include loops that iterate through multiple strategies until a sufficient solution is found a sufficient

solution may be one whose SME mapping receives a reasonably high score, or one whose SME

comparison contains no differences. Spatial routines can also contain nested loops, allowing multiple

combinations of strategies to be attempted. Thus, a spatial routine is much like a computer program

(Ullman, 1987). Note that I do not model working memory capacity. Because there are nolimits to a models working

memory, it will never fail to solve a problem because too many things must be kept in memory.

Similarly, it will never forget the previously completed steps of a problem and need to rework them. I


36/312

36

view the modeling of working memory capacity as future work which can be attempted when spatial

working memory is better understood.

Using the Models to Study Human Performance

Once models of spatial problem-solving have been developed, we can use them to study human

performance on the tasks, identifying the factors that make one problem easier or harder than another.

While several other researchers have considered this issue, an end-to-end task model allows us to be

much more explicit about how a particular factor puts more loadon the models representations and

processes. The models can also be used to automatically tag each problem with information such as the

number of elements that must be represented, the number of operations that must be applied to solve the

problem, etc. Thus, a model can code a problem more objectively than if we simply hand-code each

problem stimulus based on our best guesses about what the problem requires.

Once problems are coded for different factors, we can build mathematical models of performance

matching particular individuals, or groups, in order to determine how individuals vary in their ability to

handle different sources of difficulty. In this way, we can hopefully develop a better understanding of

commonalities and differences in peoples spatial reasoning ability.

In my analysis, I focus on three broad classes of factors: encoding & abstraction, working memory

load, and control processes. I describe each of these next.

Encoding & Abstraction

Encoding & abstraction refers to the ability to encode representations at the appropriate level of

abstraction for solving a problem. Some problems may be more difficult because they require

representations that people are less inclined or less able to compute. I am interested in two forms of

abstraction, which I callentity abstraction and relational abstraction .

Problems that are high on entity abstraction require the solver to encode elements at a particular level

in the representational hierarchy. This might be either a larger scale or a smaller scale than the most


37/312

37

comfortable level for an individual. To demonstrate, let us consider a simple form of analogical problem-

solving: the letter string analogy (Mitchell, 1993; Hofstadter, 1995). The simplest letter string analogy is:

abc : abd :: rst : ?

Here, the obvious solution is rsu, based on incrementing the final letter by one. Suppose instead we

have:

abc : abd :: rrsstt : ?

While an obvious solution here would be rrsstu, a better one might be rrssuu. Discovering this

solution depends on entity abstraction. One must recognize that the appropriate level of abstraction for

representing the rrsstt letter string is as two-letter groups. The entire tt group in rrsstt maps to the

letter c in abc.

On the other hand, problems that are high onrelational abstraction require the solver to notice more

complex relations between elements. Often, this involves comparing elements within a single image (or

letter string) to recognize commonalities and differences among them, before one compares across images

(or letter strings). For example, consider the problem:

abc : abd :: hkkuuu : ?

Here, an obvious solution might be to apply the letter successor relation and get hkkuuv or

hkkvvv. However, if one spends more t ime considering the relationships between the elements in

hkkuuu, one should recognize that this letter string contains a new kind of successor relationship: group

length. One might then choose to increment this successor relationship instead, to get hkkuuuu.

Of course, entity and relational abstraction occur in slightly different forms in spatial problem-

solving. Entity abstraction is the ability to build a representation at the level of groups of objects, objects,

or parts of an object, as the problem demands. Relational abstraction is the ability to recognize complex

relations, particularly spatial transformations such as rotations or reflections. I use the letter string

examples to illustrate that these abilities are not specific to the domain of spatial problem-solving. Entity


38/312

38

and relational abstraction may rely on a general inclination to try out different abstractions for

representing stimuli.

Unfortunately, my analysis is insufficient for distinguishing between theinclination and theability to

form abstractions. Thus, relational abstraction refers to both an interest in comparison and abstraction,

and an ability to compute spatial transformations between shapes, i.e., spatial visualization ability.

The simplest way to determine whether a certain problem requires entity or relational abstraction is

via ablation. That is, one can temporarily remove the models ability to generate representations at a

particular level, or to compute spatial transformations, and check whether the model now fails on the

problem.

Working Memory Load

Working memory load is simply the number of things that must be kept in mind when solving a

problem. Several studies have suggested that people are slower and less accurate to solve problems that

involve more elements or more transformations between elements (e.g., Mulholland, Pellegrino, &

Glaser, 1980; Bethell-Fox, Lohman, & Snow, 1984; Embretson, 1998; Vodegel-Matzen, van der Molen,

& Dudink, 1994). One advantage of computational models is that they can automatically code the

number of elements, number of relations, number of differences between representations, etc, that are

used to solve a problem. Thus, we can easily code a problem for working memory load. We can evaluate

to what extent these different factors (number of elements, number of relations, etc) place a load on

working memory by considering which factors correlate with problem difficulty.

Control Processes

The articles cited above argue that more complex problems are harder only because of increased

working memory load. However, an alternate explanation is that, in some cases, more complex problems

require more careful control over problem-solving strategies. For example, in Ravens Progressive

Matrices, as the number of elements increases, the problem of identifying the corresponding elements


39/312

39

becomes more difficult. Thus, one is more likely to identify incorrect corresponding elements on the first

try, fail to solve the problem, and require backtracking.

Carpenter, Just, and Shell (1990) have referred to this problem as goal management and have

suggested that goal management itself boils down to working memory, since more complex problems

require keeping a larger hierarchy of goals in memory. However, I believe we should consider control

processes as a separate factor, particularly as the computational models allow us to directly code the

number of operations and the amount of backtracking required to solve a problem. Thus, we can look at

whether this factor explains any of the variance in problem difficulty, beyond what working memory load

explains.

1.4 Contributions

My thesis consists of five contributions. Firstly, the Perceptual Sketchpad is an extension to the

CogSketch sketch understanding system. Given a set of objects drawn in CogSketch, the Perceptual

Sketchpad automatically segments each object into its edges and groups objects together based on

similarity. It generates human-like qualitative representations at three hierarchical levels: groups, objects,

and edges.

Secondly, Spatial Routines for Sketches (SRS)is a modeling framework inspired by Ullmans (1987)

visual routines . As with visual routines, SRS utilizes a set of cognitive operations that we know people

can accomplish. These include perceptual encoding and comparison via SME. The operations can be

combined to create a spatial routine for performing some spatial task. For example, a spatial routine

could describe one strategy for solving geometric analogy problems. A spatial routine is analogous to a

computer program: each operation takes an input, processes it, and produces an output.

SRS allows researchers to construct cognitive models of human performance on a task. Once a

researcher has written a routine, SRS implements its operations, allowing the routine to be tested on


40/312


41/312

41

for human accuracy and reaction times. In this way, the models can help us better understand what makes

a problem easy or difficult for people.

Finally, for the oddity task only, I have built separate linear models for the different cultural and age

groups. These models highlight some commonalities and differences in the representations and processes

people use to reason about space.

1.6 Outline of the Thesis

In the following two chapters, I consider the key problems of visual encoding and comparison. I

begin by outlining the psychological evidence for hierarchical hybrid representations , containing both

qualitative and quantitative features at multiple levels of abstraction. I show how even basic visual

comparisons require a strategic search through this hierarchy. I then describe my implementations of

perceptual encoding, image comparison, and shape comparison.

In Chapter 4, I present Spatial Routines for Sketches (SRS), a general framework for modeling spatial

problem-solving. I enumerate its key operations, including perceptual encoding, visual comparison, and

visual inference.

In Chapter 5, I give an introduction to cognitive modeling with SRS, discussing the general principles

that go into all my task models.

In Chapters 6-8, I describe the task models for geometric analogy, Ravens Progressive Matrices, and

the oddity task. I present results comparing each model against human performance.

In Chapter 9, I discuss predictions that follow from my models. Several hypotheses and assumptions

went into the production of each model, and these can be explicitly tested in future psychological studies.

Finally, in Chapter 10, I draw conclusions, discuss related work, and consider directions for future

research.


42/312

42

2. Background

Spatial problem-solving depends on the ability to compare images and identify similarities and

differences. This is difficult because even simple images contain many features, of which only a few are

critical to a given comparison. For example, Figure 2.1A displays an oddity task problem, in which one

must choose the image that doesnt belong. Each image contains four circles. The circles possess various

features (e.g., roundness), and there are several spatial relations between the circles. On the other hand,

each image also contains a row of dots. In five of the images, there is only a single row, whereas the last

image contains a row plus an additional outlying dot. Thus, it is much easier to pick the odd image out if

we view them as rows then if we view them as individual circles. However, we have no way of knowing

a priori what features will be important when we consider a problem.

A) B)

Figure 2.1 Two oddity task problem from Dehaene et al. (2006). Pick the image that doesnt

belong.

To solve these problems, one must flexibly move between candidate representations until a

comparisons critical features are discovered. Here, I argue that human perceptual systems produce

hierarchical hybrid representations (HHRs) of space. These representations are hierarchical in that a

given stimulus can be encoded at several levels of abstraction. They are hybrid in that, at each level, there

are at least two types of representations: qualitative and quantitative. For example, consider the edges of

the objects in Figure 2.1B. These edges possess several quantitative features, such as length and


43/312

43

orientation, that are irrelevant to the problem. However, if we focus on thequalitative relations between

edges, we see that in all images except one, there are two pairs of parallel edges.

Psychological evidence suggests hierarchical representations are explored top-down (Hochstein &

Ahissar, 2002). That is, an individual will first compare two stimuli at a high level of abstraction and

then move down as needed. In contrast, the temporal interaction between qualitative and quantitative

representations is less clear (Bornstein & Korda, 1984; Kosslyn et al., 1977). However, in both cases,

individuals may strategically reprioritize representations, depending on the task demands.

Top-down comparison of hierarchical representations may depend on identifying corresponding

elements in structural descriptions. I will argue that structure-mapping theory (Gentner, 1983, 1989)

provides a parsimonious explanation of how people align representations during visual comparison.

I begin by presenting psychological evidence for hybrid and for hierarchical representations. I then

discuss structure-mapping as a mechanism for visual comparison over HHRs. Afterwards, I apply HHRs

to mental rotation, a commonly studied comparison task. Finally, I consider how individuals strategically

search through HHRs during comparison.

2.1 Hybrid RepresentationsEvidence for hybrid representations comes from at least three bodies of research: categorical

perception, object recognition, and visual priming. I consider each of these in turn. Many of the

experiments below use the same/different paradigm. In visual same/different experiments (Farell, 1985),

participants are shown two images, either sequentially or simultaneously, and asked whether they are the

same or different. By varying the type and degree of differences, researchers can study the form of visual

representations. Furthermore, by varying the exposure time and the interval between stimuli, researchers

can potentially isolate the contributions of encoding, memory, and comparison processes. As we shall

see, same/different experiments provide evidence for at least two distinct representations: qualitative and

quantitative.


44/312

44

Categorical Perception

Categorical perception is our ability to use qualitative categories when distinguishing between

quantitative values. In categorical perception experiments, participants compare stimuli that typicallyvary along a single dimension (e.g., color: Bornstein & Korda, 1984; size: Kosslyn et al., 1977; or

location: Maki, 1982). A common finding is that the more distant the stimuli are along that dimension,

the more quickly an individual can recognize that they are different, or that one is greater than the other.

However, when the stimuli have different categorical values, performance improves significantly.

For example, Bornstein and Korda (1984) showed participants color patches in a sequential

same/different task. They used seven patches that varied from blue to green, such that the middle patch(4) was ambiguous in color. Thus, they could independently vary the quantitative difference and the

qualitative difference (same-color vs. different-color) between patches. They found that for pairs with a

distance of two, participants responded different faster when the patches were different colors than

when they were the same color, suggesting that participants used the color labels when comparing.

However, participants could also distinguish between different patches with the same label, indicating

they had access to the quantitative hue value. Bornstein and Korda theorized that participants werecomparing the quantitative and qualitative color values in parallel, and responding as soon as either

comparison returned a difference.

Categorical perception has been studied extensively in the color domain (e.g., Bornstein & Korda,

1984; Roberson, Pak, & Hanley, 2008; Regier & Kay, 2009). However, it has also been found in other,

more spatial domains. Kosslyn et al. (1977) taught participants to draw 6 stickmen of varying sizes and

also trained them to characterize the three largest as large and the three smallest as small. Similarly,

Maki (1982) taught participants the locations of twelve cities on a map, in which six cities were in a

northern state and six were in a southern state. Afterwards, participants were given the names of stimulus

pairs and asked to judge which was larger or farther north. Both researchers found that participants


45/312

45

responded faster when the stimuli were farther apart along the relevant dimension. However, participants

also responded faster to stimuli on opposite sides of the category boundary they had learned.

Categorical perception has also been found for spatial relations between an objects parts. Rosielle

and Cooper (2001) used line-drawings of 3-D objects in a sequential same/different study. Each object

had two parts connected by an arm. The arms orientation varied along four values: 0 , 30, 60, and 90.

For the first value (0), participants might recognize a qualitative parallel relationship between the

parts. For the last value (90), participants might recognize a perpendicular relationship. However,

either of the middle two values would likely be labeled as oblique. Rosielle and Cooper compared

performance across the conditions where the two objects had identical parts but their angles differed by

30. They found that participants responded slowest (and error rates were highest) when the object

changed from one oblique angle to the other. Participants performed better when the object changed to or

from a parallel or perpendicular angle, suggesting they were encoding and using these qualitative spatial

relations.

Finally, Ferguson, Aminoff, and Gentner (1996) showed that categorical perception can play a role

even when perceiving a single object. They showed participants 2-D polygons and asked them to

evaluate whether each polygon possessed an axis of symmetry. This is essentially a same-different task

in which the participant is comparing one half of a shape to the other. The experimenters looked at the

time and accuracy to detect purely quantitative asymmetries vs. qualitative asymmetries, where the latter

included changes in the number of vertices and changes between concave and convex vertices. The

experimenters attempted to keep the degree of quantitative asymmetry constant while varying the

presence of qualitative asymmetries. They found that participants were better able to detect qualitative

asymmetries, suggesting again that participants use qualitative spatial relations during comparison.

These studies indicate that participants consider both quantitative and qualitative features during

visual comparison. Features may be purely visual (i.e., color) or spatial. Spatial features may include

attributes of individual objects or relations between elements (i.e., relative orientation, convex vs.


46/312

46

concave). It is less clear whether individuals prefer one type or the other. Some researchers (Bornstein &

Korda, 1984; Kosslyn et al., 1977) have suggested that qualitative and quantitative features are compared

in parallel, so that a large difference along either dimension produces a fast response. We shall return to

this question later.

Object Recognition

A heated debate in the field of object recognition provides further evidence. Here, the question is

how people rapidly recognize objects from different viewpoints. Biederman and colleagues (Biederman,

1987; Hummel & Biederman, 1992) have proposed that people represent an object as a set of shape

primitives, called geons , with qualitative spatial relations between the geons. Geons and relations are

identified based onnon-accidental properties , properties of the image that are unlikely to be tied to a

specific viewpoint. For example, if two edges along a shapes surface appear parallel from one

viewpoint, they will also appear parallel from most other viewpoints. Thus, a given object should

produce a similar geon representation from many different viewpoints, facilitating the recognition

process.

A set of geons, features of the geons, and spatial relations between geons, make a geon structural

description (GSD). This approach aligns with the qualitative representations described above in that

GSDs capture qualitative features of, and relations between, the parts of an object. It also provides some

pointers to what qualitative features might be encoded those features which are unlikely to have

occurred accidentally, due to viewing a scene from a particular viewpoint.

Biederman and colleagues predictthat any rotation which does not alter an objects GSD should not

interfere with ones ability to recognize it. They have shown (Biederman & Gerhardstein, 1993) that

participants can quickly recognize two objects as the same, regardless of rotation in depth, provided they

generate the same GSD. Similarly (Biederman & Bar, 1999), participants can quickly recognize that two

objects are different, regardless of rotation in depth, when they generate different GSDs, i.e., they differ in


47/312

47

a non-accidental property. However, when two objects differ only in a quantitative property, recognizing

they are different over rotations is difficult. This fits well with the above finding (Rosielle & Cooper,

2001) that participants can better tell two objects are different when the change in their angles is

qualitative, rather than quantitative.

However, these results have not gone unchallenged. Tarr and colleagues (Tarr et al., 1997; Tarr et al.,

1998; Hayward & Tarr, 2000) have argued that we use viewpoint-dependent representations to recognize

objects. They claim that we deal with multiple viewpoints by learning several different representations

for a given object, and by mentally transforming between a novel view and the closest known view.

Thus, they predict that when an object is unfamiliar, there should be a cost for recognizing it from a novel

viewpoint. In support of this, their studies have shown a cost for recognizing objects from new

viewpoints, even when the GSDs are mostly identical. These studies used the same paradigms (Tarr et

al., 1997) and sometimes even the same stimuli (Tarr et al., 1998) as Biederman and colleagues. Thus,

apparently very subtle differences can shift participants between representations that are more or less

viewpoint-specific.

Biederman and Bar (1999) suggest one factor is the accessibility of the geons in a 3-D shape. Even

differences to a single vertex can make it harder to recognize a rotated version of an object (Biederman &

Bar, 1998). Of course, this means the viewpoint-invariant representation is quite brittle. If it is so easily

disrupted, then one cannot expect two viewpoints of the same object to produce identical representations

consistently. On the other hand, viewpoint-invariant recognition is achievable in some situations

(Biederman & Gerhardstein, 1993). Thus, the results suggest: a) we can access a viewpoint-invariant

representation, such as GSD; b) this representation can be used to quickly compare two object images; c)

when this representation is insufficient, we must fall back on viewpoint-specific information to perform

the comparison. Here, again, we see a hybrid perceptual representation. One component is qualitative

and primarily viewpoint-invariant. The other contains quantitative and viewpoint-specific information.


48/312

48

Visual Priming

Further support for hybrid representations comes from visual priming studies. Hummel and

colleagues (Stankiewicz, Hummel, & Cooper, 1998; Stankiewicz & Hummel, 2002; Thoma, Hummel, &Davidoff, 2004) have argued that a viewpoint-invariant representation such as GSD can only be generated

when one attends to an image. Attention allows one to identify the parts of an object and bind them to

features and relations (Treisman & Gelade, 1980), thus generating a structural description. If an

individual is exposed to an image but fails to attend to it, they will generate a more holistic, viewpoint-

specific representation.

In their experiments, two objects are displayed on the screen together. Participants are cued to attendto one, while they are given very little time to notice the other typically, the two objects together are

displayed for only 120 ms. Afterwards, participants are shown another object and asked to name it. The

experimenters predict that if an individual attends to an object, they will generate a viewpoint-invariant

representation that will prime them for naming that object again, even if the object is transformed.

However, if an individual doesnt attend to an object, they will generate only a viewpoint -specific

representation. This representation will support priming across a much narrower set of transformations.Hummel and colleagues found that when participants attended to an object, they were primed to

recognize it again, even if the original object was mirror-reflected (Stankiewicz, Hummel, & Cooper,

1998), or scrambled (Thoma, Hummel, & Davidoff, 2004) by splitting the image into two halves and

inverting them. Importantly, the priming for both mirror-reflected and scrambled images was greater than

the priming for seeing a different object from the same class (e.g., two different pianos), so this was not

merely semantic priming.In contrast, an ignored image failed to prime mirror-reflected or unscrambled versions of itself,

although it did prime an identical version, as well as larger or smaller versions (Stankiewicz & Hummel,

2002). Curiously, a scrambled, ignored image failed to prime even an identical scrambled image (Thoma,

Hummel, & Davidoff, 2004). This, along with the size invariance, suggests that the ignored image does


49/312

49

not simply produce a pixel-by-pixel representation. Some amount of abstraction or recognition appears to

occur even for ignored images, although it is insufficient for supporting transformations such as mirror

reflection.

Characterizing Qualitative Representations

We have now seen strong evidence for a qualitative/quantitative distinction in visual representations,

as well as reasonably strong evidence for a viewpoint-invariant/viewpoint-specific distinction. However,

two questions remain: 1) Are the qualitative and the quantitative simply two sets of features in a single

representation, or are there real differences in the ways they are represented and reasoned over? 2) How

closely tied together are the qualitative representation and the viewpoint-invariant representation? To

better answer these questions, I will now make some claims about the nature of qualitative

representations.

1) Qualitative representations appear linked to symbolic processing in the brain

Qualitative features (e.g., blue, above, or parallel) often correspond to single words, whereas

quantitative features (e.g., 3 inches to the right) often do not. Therefore, if qualitative and quantitative

information are represented separately in the brain, one might suppose that the left hemisphere, known for

language-processing (at least in right-handed ind

Documents

Lovett Thesis Final