Lovett Thesis Final

Embed Size (px)

Citation preview

  • 7/27/2019 Lovett Thesis Final

    1/312

  • 7/27/2019 Lovett Thesis Final

    2/312

    2

    Copyright by Andrew Lovett 2012All Rights Reserved

  • 7/27/2019 Lovett Thesis Final

    3/312

    3

    ABSTRACT

    Spatial Routines for Sketches: A Framework for Modeling Spatial Problem-Solving

    Andrew Lovett

    Spatial problem-solving tasks are often used to evaluate peoples cognitive abilities. For example,

    Ravens Progressive Matrices is a popular intelligence test. In it, an individual is shown an array of two-

    dimensional images, with one image missing. The individual must compare the images and identify a

    pattern of differences between them, in order to solve for the missing image. Performance on tasks such

    as Ravens and geometric analogy (A is to B as C is to..?) correlates strongly with performance on

    many other ability tasks, in the spatial, verbal, and mathematical domains. Thus, these tasks appear to

    depend on core, general-purpose representations and processes. However, it is as yet unclear what those

    representations and processes are.

    To better understand these tasks, we developed Spatial Routines for Sketches (SRS), a general

    framework for modeling spatial problem-solving. SRS is based on a set of psychological claims about

    how people perform spatial problem-solving: 1) When possible, people use qualitative representations

    describing features such as relative position or orientation, rather than exact numerical values. 2) Spatial

    representations are hierarchical. A given image might be represented as object groups, individual objects,

    or the parts within each object. 3) Qualitative spatial representations can be compared via structure-

    mapping. Structure-mapping involves aligning the relational structure in two representations to find the

    corresponding elements.

    Three task models were built within the SRS framework : geometric analogy, Ravens Progressive

    Matrices, and the oddity task, in which one sees a set of images and picks the one that is different. The

    three task models use identical representations and similar processes. Thus, they allow us to test the

    generality of the psychological claims, as well as the representations and processes that implement these

    claims.

  • 7/27/2019 Lovett Thesis Final

    4/312

  • 7/27/2019 Lovett Thesis Final

    5/312

  • 7/27/2019 Lovett Thesis Final

    6/312

    6

    TABLE OF CONTENTS

    1. Introduction ............................................................................................................................................16

    1.1 Motivation ......................................................................................................................................19

    1.2 Background ....................................................................................................................................21

    1.3 Claims .............................................................................................................................................29

    1.4 Contributions ..................................................................................................................................39

    1.5 Evaluation .......................................................................................................................................40

    1.6 Outline of the Thesis ......................................................................................................................41

    2. Background ............................................................................................................................................42

    2.1 Hybrid Representations ..................................................................................................................43

    2.2 Hierarchical Representations ..........................................................................................................55

    2.3 Structural Comparison ....................................................................................................................61

    2.4 Mental Rotation: An Example Domain ..........................................................................................62

    2.3 Strategic Comparison .....................................................................................................................68

    3. Modeling Perception and Comparison ...................................................................................................72

    3.1 Existing Models ..............................................................................................................................73

    3.2 Perceptual Sketchpad......................................................................................................................77

    3.3 Image Comparison .........................................................................................................................91

    3.4 Shape Comparison ..........................................................................................................................97

    3.5 Perceptual Reorganization ............................................................................................................109

    4. Spatial Routines for Sketches ..............................................................................................................112

    4.1 Operation Categories ....................................................................................................................114

    4.2 Spatial Operations ........................................................................................................................118

    4.3 Other Operations ..........................................................................................................................153

  • 7/27/2019 Lovett Thesis Final

    7/312

    7

    4.4 Spatial Routine Language .............................................................................................................154

    4.5 Gathering Data .............................................................................................................................166

    5. Simulations ..........................................................................................................................................169

    5.1 Representation ..............................................................................................................................170

    5.2 Task Comparison ..........................................................................................................................173

    5.3 Parameters and Sensitivity Analysis ............................................................................................174

    5.4 Analyses .......................................................................................................................................175

    6. Geometric Analogy ..............................................................................................................................177

    6.1 Background ..................................................................................................................................178

    6.2 Model............................................................................................................................................181

    6.3 Model Predictions .........................................................................................................................189

    6.4 Behavioral Experiment .................................................................................................................190

    6.5 Simulation ....................................................................................................................................200

    6.6 Related Work ................................................................................................................................205

    6.7 Conclusion ....................................................................................................................................207

    7. Ravens Progressive Matrices ..............................................................................................................209

    7.1 Background ..................................................................................................................................210

    7.2 Model............................................................................................................................................215

    7.3 Behavioral Experiment .................................................................................................................225

    7.4 Simulation ....................................................................................................................................228

    7.5 Discussion ....................................................................................................................................235

    7.6 Related Work ................................................................................................................................238

    8. The Oddity Task ..................................................................................................................................241

    8.1 Background ..................................................................................................................................242

    8.2 Model............................................................................................................................................243

  • 7/27/2019 Lovett Thesis Final

    8/312

  • 7/27/2019 Lovett Thesis Final

    9/312

    9

    LIST OF FIGURES

    Figure 1.1: Geometric analogy problem .....................................................................................................16

    Figure 1.2: Three spatial problem-solving tasks .........................................................................................18

    Figure 1.3: An oddity task problem involving groups of objects ...............................................................32

    Figure 1.4: Two related geometric analogy problems, along with average reaction times and overallaccuracies for human participants ...............................................................................................................35

    Figure 2.1: Two oddity task problems from Dehaene et al. (2006). Pick the image that doesnt belong ... 42

    Figure 2.2: Stimuli from (A) Navon (1977, Experiment 4), and (B) Love, Rouder, and Wisniewski (1999)

    (B) ...............................................................................................................................................................58

    Figure 2.3: Two images that one might compare ........................................................................................60

    Figure 2.4: Mental rotation stimuli from (A) Shepard and Metzler (1971), and (B) Cooper and Shepard(1973) ..........................................................................................................................................................62

    Figure 2.5: Reaction time results from Biederman and Bar (1999, Experiment 2) ...................................67

    Figure 3.1: Example stimuli from Love, Rouder, and Wisniewski (1999) .................................................72

    Figure 3.2: The CogSketch sketch understanding system ..........................................................................74

    Figure 3.3: Sketch of a house ......................................................................................................................78

    Figure 3.4: Examples of texture patches .....................................................................................................79

    Figure 3.5: Examples of space objects ........................................................................................................79

    Figure 3.6: Grouping examples ...................................................................................................................80

    Figure 3.7: Examples of parallel edges, parallel objects, and parallel groups ............................................82

    Figure 3.8: Example representation with all expressions about Edge-4 in a parallelogram .......................85

    Figure 3.9: A rotation between two triangles ..............................................................................................85

    Figure 3.10: Two images that might be compared, with the objects labeled ..............................................92

  • 7/27/2019 Lovett Thesis Final

    10/312

    10

    Figure 3.11: Two ambiguous mappings ......................................................................................................94

    Figure 3.12: Object pairs whose textures would be considered identical ...................................................95

    Figure 3.13: Shape change examples ..........................................................................................................96

    Figure 3.14: Shape rotation example ..........................................................................................................98

    Figure 3.15: A) Spatial reflections. B) Shape reflections .........................................................................101

    Figure 3.16: Examples of lengthening ......................................................................................................104

    Figure 3.17: Example of part lengthening ................................................................................................104

    Figure 3.18: Examples of part addition .....................................................................................................105

    Figure 3.19: Example of a subshape deformation .....................................................................................105

    Figure 3.20: Corresponding edges for a symmetry mapping ....................................................................106

    Figure 3.21: Example of a scaled group ...................................................................................................107

    Figure 3.22: Complex reorganization facilitates this comparison ............................................................111

    Figure 4.1: An ambiguous image ..............................................................................................................113

    Figure 4.2:A: Oddity task problem from (Dahaene et al., 2006). B: Ravens Matrix -type problem .......115

    Figure 4.3: Geometric analogy problems ..................................................................................................117

    Figure 4.4: A geometric analogy problem, as it looks in CogSketch .......................................................120

    Figure 4.5:A Ravens Progressive Matrices A problem (not from the actual test) ...............................120

    Figure 4.6: A row of images for difference-finding ..................................................................................129

    Figure 4.7:Ravens Progressive Matrix problems requiring a literal pattern of variance ........................131

    Figure 4.8: A row of images with first-to-last matches ............................................................................135

    Figure 4.9: Oddity task problems with orientation differences .................................................................136

    Figure 4.10: A: A geometric analogy problem. B: The patterns of variance ............................................137

  • 7/27/2019 Lovett Thesis Final

    11/312

    11

    Figure 4.11: Examples of Infer-Shape with reflections ............................................................................140

    Figure 4.12: Examples of Infer-Shape with deformations ........................................................................141

    Figure 4.13: Examples of Infer-Shapes with texture transformations ......................................................141

    Figure 4.14: A geometric analogy problem ..............................................................................................142

    Figure 4.15: A geometric analogy problem with an added object ............................................................143

    Figure 4.16:Four 3x3 Ravens Matrix prob lems ......................................................................................145

    Figure 4.17: A geometric analogy problem with an unclear rotation .......................................................148

    Figure 4.18:A Ravens Matrix problem requiring complex perceptual reorganization ...........................149

    Figure 4.19:A Ravens Matrix problem requiring texture detection ........................................................150

    Figure 4.20: A Bongard problem from (Bongard, 1970) ..........................................................................163

    Figure 4.21: The Routine Inspector on a geometric analogy problem ......................................................167

    Figure 5.1: A: Geometric analogy problem (Evans, 1968). B: Ravens Matrix problem. C: Oddity task problem (Dehaene et al., 2006) .................................................................................................................169

    Figure 5.2: Each internal edge of a textured object was added as a separate object .................................172

    Figure 6.1: A geometric analogy problem ................................................................................................178

    Figure 6.2: Image pairs requiring perceptual reorganization ....................................................................182

    Figure 6.3: Strategy for finding differences between two images (see Appendix B for its implementationin the Spatial Routine language) ...............................................................................................................184

    Figure 6.4: Geometric analogy problems requiring second-order comparison .........................................187

    Figure 6.5: The answer chosen depends on whether one prefers canonical reflections or rotations ........189Figure 6.6: Reflecting the B shape will produce an identical B shape, as in Figure 6.4B .................190

    Figure 6.7: The dot to be removed is in different locations in A and C ....................................................190

    Figure 6.8: Problems 1-3 (times are seconds required for human participants to pick an answer; values below answers are the percentage of participants who picked each answer) ............................................194

  • 7/27/2019 Lovett Thesis Final

    12/312

    12

    Figure 6.9: Problems 4-6 ...........................................................................................................................195

    Figure 6.10: Problems 7-9 .........................................................................................................................196

    Figure 6.11: Problems 10-12 .....................................................................................................................197

    Figure 6.12: Problems 13-15 .....................................................................................................................198

    Figure 6.13: Problems 16-18 .....................................................................................................................199

    Figure 6.14: Problems 19-20 .....................................................................................................................200

    Figure 7.1: A matrix problem ...................................................................................................................209

    Figure 7.2: Example problems for various Carpenter et al. (1990) rules ..................................................211

    Figure 7.3: This matrix problem requires complex perceptual reorganization to solve............................214

    Figure 7.4: Two 1x1 matrix problems ......................................................................................................217

    Figure 7.5: Solution strategies for the above problems .............................................................................217

    Figure 7.6: Problems where computing the pattern of variance may be difficult .....................................219

    Figure 7.7: Subroutine for finding differences in a row (see Appendix C for its implementation in theSpatial Routine language). Continued in Figure 7.8 ................................................................................221

    Figure 7.8: Subroutine for finding differences in a row. Continued from Figure 7.7 ..............................222

    Figure 7.9: Different ways two textures might overlap ............................................................................230

    Figure 8.1: Oddity task problem from Dehaene et al. (2006). The image without parallel lines is the oddone out .......................................................................................................................................................242

    Figure 8.2: C and D can only be solved by considering edges .................................................................244

    Figure 8.3: Certain information can be filtered out...................................................................................245Figure 8.4: B and C require a different similarity measure .......................................................................246

    Figure 8.5: One of the six problems the model failed to solve. Average human performance: 68%(American adults), 86% (Munduruk) ......................................................................................................250

    Figure 8.6: Problems with closed and open shapes...................................................................................251

  • 7/27/2019 Lovett Thesis Final

    13/312

    13

    Figure 8.7: Problems relying on (A) shape transformation and (B) shape symmetry ..............................251

    Figure 8.8: Problems requiring shape comparison between images .........................................................256

    Figure 8.9: Problems involving quantitative features ...............................................................................257

    Figure 8.10: Problem in which the quantitative difference is particularly salient ....................................258

    Figure 9.1: Two image pairs. It is harder to align the same-shaped objects in B than in A.....................260

    Figure 9.2: Can you spot the difference between each image pair? ..........................................................261

    Figure 9.3: Can you spot the difference between the more complex image pairs? ...................................261

    Figure 9.4: Object-level differences (A, B) may be more salient than edge-level differences (C) ...........262

    Figure 9.5: Problems involving shape rotations ........................................................................................264

    Figure 9.6: An oddity task problem (A) and a Ravens matrix problem (B) with identi cal images .........264

    Figure 9.7: This geometric analogy problem requires quantitative information to solve .........................265

    Figure 9.8: This geometric analogy problem presents a novel challenge .................................................265

    Figure A.1: Two images that might be compared .....................................................................................289

  • 7/27/2019 Lovett Thesis Final

    14/312

    14

    LIST OF TABLES

    Table 3.1: Orientation-invariant qualitative vocabulary for edges ............................................................84

    Table 3.2: Additional orientation-specific vocabulary for edges ................................................................87

    Table 3.3: Object-level qualitative vocabulary. Terms marked with anO are orientation-specific .........88

    Table 3.4: Candidate inferences between images 10A and 10B .................................................................92

    Table 3.5: Types of shape comparisons ...................................................................................................102

    Table 4.1: Pattern of variance for the first two images in Figure 4.6 ........................................................129

    Table 4.2: Generalized forms for predicates in difference representations ...............................................138

    Table 4.3: A template and examples of operation calls ............................................................................155

    Table 4.4: A template and examples of control structures ........................................................................160

    Table 4.5: A spatial routine for solving Bongard problems ......................................................................164

    Table 4.6: Pattern of variance for Figure 4.20 ..........................................................................................165

    Table 6.1: Linear model for human reaction times on geometric analogy ...............................................204

    Table 7.1: Linear regression for accuracy of Northwestern students ........................................................233

    Table 7.2: Second linear model for accuracy of Northwestern students ...................................................234

    Table 7.3: Third linear model for accuracy of Northwestern students......................................................235

    Table 7.4: Linear model for reaction times of Northwestern students (in seconds) .................................235

    Table 8.1: Accuracy of the model and each participant group on the 45 oddity task problems ...............248

    Table 8.2: Correlations in accuracy on each of the 45 problems (Pearsons r) .........................................249

    Table 8.3: Rankings of the 6 problems the model failed to solve (1 = easiest, 45 = hardest) ...................249

    Table 8.4:Linear models for each groups accuracy ................................................................................253

    Table 8.5:Linear models for each groups accuracy, with Elems2 ..........................................................253

  • 7/27/2019 Lovett Thesis Final

    15/312

    15

    Table 8.6: Linear models for each groups reaction times (in s) ..............................................................254

  • 7/27/2019 Lovett Thesis Final

    16/312

    16

    1. Introduction

    Spatial problem-solving tasks are a popular toolfor evaluating peoples cognitive abilities. For

    example, in geometric analogy (Figure 1.1), an individualis shown an array of images and asked A is to

    B as C is to? Like all the tasks I am studying, geometric analogy requires: 1) building up

    representations of two-dimensional images; 2) comparing those representations; and 3) identifying a

    pattern across them. Thus, these tasks depend critically onones ability to encode spatial relations within

    an image, compare images, and abstract out higher-order relations between then based on what is

    common or different.

    Figure 1.1 Geometric Analogy problem.

    In the past, spatial problem-solving has been used to evaluate various abilities, from geometric

    knowledge (Dehaene et al., 2006) to general intelligence (Raven, Raven, & Court, 1998). However, there

    is disagreement about what exactly a particular task evaluates (e.g., geometric analogy: Sternberg, 1977;

    Mulholland, Pellegrino, & Glaser, 1980; Raven's Progressive Matrices: Carpenter, Just, & Shell, 1990;

    Primi, 2001). I believe we need a more concrete understanding of the representations and processes

    people use to solve these tasks. Applied to a single task, this might illuminate the abilities that separate

    one persons performance from anothers. Applied across tasks, it should help explain how people reason

    about space and spatial relations.

  • 7/27/2019 Lovett Thesis Final

    17/312

    17

    For my thesis, I have constructed Spatial Routines for Sketches (SRS), a general framework for

    building, evaluating, and comparing spatial problem-solving task models. Bytask model , I mean an end-

    to-end model of human performance on a task, which begins with visual input, generates a representation,

    reasons over the representation, and chooses an output behavior. While the SRS models may vary in their

    specific strategies, they are all built upon three core hypotheses:

    1) When possible, people usequalitative, or categorical, representations to reason about space (e.g.,

    Biederman, 1987; Kosslyn et al., 1989; Forbus, Nielsen, & Faltings, 1991). In particular, they encode the

    qualitative spatial relations between elements in a visual scene.

    2) Qualitative, structural representations of space are compared via structure-mapping (Gentner,

    1983). According to structure-mapping theory, people compare two cases by aligning their common

    relational structure, thereby highlighting commonalities and differences across the cases.

    3) Spatial problem-solving requires flexibly moving between different levels in a hierarchy of spatial

    representations (Palmer, 1977) until a suitable level is found. A level of representation is suitable if,

    when the representations are compared, a pattern emerges which can be used to solve the problem. For

    example, Figure 1.1 involves the spatial relations between the objects in each image. However, other

    problems might require focusing on the edges in a single object, or backing out and considering groups of

    objects.

    I have two primary goals in modeling spatial problem-solving. The first is to evaluate whether the

    above hypotheses are sufficient for explaining human performance. A sufficient task model should meet

    two criteria: 1) The model should perform the task with a high degree of accuracy, making no more errors

    than a typical human would. 2) When the model does fail, its error patterns should match human error

    patterns, i.e., problems that are hard for the model should also be hard for people.

  • 7/27/2019 Lovett Thesis Final

    18/312

  • 7/27/2019 Lovett Thesis Final

    19/312

  • 7/27/2019 Lovett Thesis Final

    20/312

    20

    occupations require that individuals be skilled at mentally manipulating representations of images. For

    example, when a surgeon is operating, they may be required to mentally construct a three-dimensional

    structure from a two-dimensional image.

    Importantly, real-life spatial visualization ability can be evaluated by abstract problems like paper-

    folding. Several studies have found correlations between performance on abstract tasks and real-world

    ability in surgeons (see Hegarty et al., 2007, for a review). Shea, Lubinski, & Benbow (2001) found that

    performance of 13-year-olds on spatial visualization tasks predicted what they would study in college and

    what eventual job they would get, even controlling for verbal and mathematical ability. Individuals with

    higher spatial ability were more likely to be scientists and engineers. This analysis was restricted to the

    top 1% of 13-year-olds, but ongoing research (Hedges & Chung, in preparation) suggests the findings

    generalize to the rest of the population.

    Thus, it appears that abstract, spatial problem-solving tasks tap into a spatial visualization ability that

    is useful in many advanced disciplines. If we can understand how people perform spatial transformations

    in these tasks, we will be better prepared to educate students in spatial skills, and hopefully enhance their

    ability to perform in those disciplines.

    General Problem-Solving Ability

    Many problem-solving tasks in the spatial, verbal, and mathematical domains have been used to

    evaluate peoples abilit ies. In the past, correlations in performance across these tasks caused Spearman

    (1923; 1927) to put forward the notion of g , a single, general intelligence measure which could predict an

    individuals ability across a large set of task domains. While this thesis is not directly concerned with g,

    Spearmans work suggests that some mental abilities are utilized across a wide range of tasks. If spatial

    problem-solving tasks tap in those abilities, then our models can give us insights into peoples general

    reasoning processes.

  • 7/27/2019 Lovett Thesis Final

    21/312

    21

    In fact, there is strong evidence that one of the tasks, Ravens Progressive Matrices (RPM), does tap

    into general abilities. RPM was originally designed to evaluate one major component of g , eduction , or

    the ability to identify meaningful patterns in confusing data (Raven, Raven, & Court, 1998). Since the

    tests creation, several studies have shown it to be one of the highest single-test correlates with g (e.g.,

    Burke & Bingham, 1969; Zagar, Arbit, & Friedland, 1980; see Raven, Raven, & Court, 2000b, for a

    review), meaning an individuals performance on RPM predicts their performance on many other

    intelligence tests. In a multi-dimensional scaling analysis of ability tests, Snow and colleagues (Snow,

    Kyllonen, & Marshalek, 1984; Snow & Lohman, 1989) found that RPM lies in the middle of the ability

    space. That is, while most ability tests cluster with other tests in the same domain, e.g., verbal,

    mathematic, and spatial tests, RPM correlates highly with the most abstract ability tests from each

    domain. This suggests that although RPM is a visuo-spatial task, it evaluates several domain-general

    mental abilities. My hope, in modeling this task, is to gain a greater understanding of what those abilities

    are.

    1.2 Background

    Two of the spatial problem-solving tasks, geometric analogy and the Ravens Progressive Matrices,

    have been studied extensively in the past, while research on the oddity task is more limited. In the

    following sections I summarize the research, focusing on two questions: 1) What are the processes people

    use to perform the task? 2) What factors contribute to the relative difficulty of each problem?

    Geometric Analogy

    Processes

    Geometric analogy (Figure 1.1, Figure 1.2A) has been studied and modeled by many researchers over

    the last 50 years. Evans ANALOGY (1968) was the first computational model of the task. A

    sophisticated program for its time, ANALOGY automatically encoded representations of the images in a

  • 7/27/2019 Lovett Thesis Final

    22/312

    22

    problem and could even identify spatial transformations between shapes, such as rotations. Once

    representations had been constructed or hand-coded, it solved a problem via what has been termed an

    infer-infer-compare strategy (Mulholland, Pellegrino, & Glaser, 1980):

    1) Infer a mapping between images A and B, describing what changed between them.

    2) For each possible answer n, infer a mapping between images C andn.

    3) Compare the A/B mapping to the C/n mapping for each possible answer, and choose the answer for

    which this comparison returns the closest match.

    ANALOGYs mapp ing processes are unlike human mapping processes in that they perform an

    exhaustive search for the best possible mapping, rather than using heuristics (Gentner, 1983). However,

    the infer-infer-compare model remains a reasonable strategy for geometric analogy, and it is one we will

    return to.

    Sternberg (1977) argued that people actually solve problems like geometric analogy via aninfer-map-

    apply strategy (Mulholland, Pellegrino, & Glaser, 1980):

    1) Infer a mapping between images A and B, describing what changed between them.

    2) Compute a mapping between images A and C, identifying their corresponding elements.

    3) Based on this mapping, apply the A/B differences to C to compute D, a representation of the

    image that would best complete the analogy. This D r epresentation can then be compared to each

    possible answer to see which one best matches it.

    Sternberg produced evidence that people utilized infer-map-apply on his problem set, while

    Mulholland, Pellegrino, & Glaser (1980) argued that people utilized infer-infer-compare ontheir problem

    set. Later, researchers suggested that people adjust their strategy depending on factors such as the

    problems complexity and the relatedness of the items being compared (Bethell-Fox, Lohman, & Snow,

    1984; see also Grudin, 1980, on verbal analogies).

    These early researchers were focused on the particular strategy, or the set of cognitive components,

    that a person would use to solve a geometric analogy. However, they were less clear on what specific

  • 7/27/2019 Lovett Thesis Final

    23/312

    23

    mechanisms might be used, for example, to compare images A and B. Only Evans (1968) had a complete

    computational model. While computational models of geometric analogy have become more popular in

    recent years (e.g., Bohan & ODonogh ue, 2000; Schwering et al., 2009), I believe no other researchers

    have built an end-to-end computational model that could be compared against human performance.

    Factors Contributing to Difficulty

    The above researchers also analyzed the factors contributing toeach problems relative difficulty. In

    generating their stimuli, Mulholland, Pellegrino, & Glaser (1980) independently varied the number of

    elements in each image and the number of transformations between images, e.g., the number of

    differences between images A and B. They found that as either number of elements or number of

    transformations increased, the reaction time increased, while the accuracy decreased. Importantly, for

    reaction time the effects of elements and transformations were not simply additive; there was a

    particularly great cost when both number of elementsand number of transformations increased. The

    researchers believed this was due to working memory load. As the number of elements or

    transformations increased, the working memory load also increased. For particularly large numbers of

    both, working memory load exceeded peoples capacity, forcing them to change their problem-solving

    strategy, and resulting in substantial reaction time costs.

    Bethell-Fox, Lohman, & Snow (1984) independently varied several factors in their geometric analogy

    problems: number of elements in the images, number of transformations between images, figural vs.

    spatial transformations, number of possible answers to choose from, and similarity of distractors to the

    correct answer. They found that all these factors correlated with reaction times, while all factors except

    number of transformations correlated with error rates. They also found some interesting interactions between the factors for example, those problems with the most imagesand the most possible answers

    were significantly harder than other problems. Again, this may have been because the large memory load

    required a strategy shift.

  • 7/27/2019 Lovett Thesis Final

    24/312

  • 7/27/2019 Lovett Thesis Final

    25/312

    25

    3) Distribution of three values: There are three different elements in the three images of the row.

    Every row must contain each of those three elements, but their order varies.

    4) Figure addition or subtraction: If the first image contains element X and the second image contains

    element Y, the third image must contain elements Xand Y. Subtraction is similar.

    5) Distribution of two values: There is an element that is present in two of the images but not the

    third.

    For example, to solve Problem 2B, one would study the images in the first row, note that the groups

    of three squares correspond to each other, and recognize that the relation between these groups can best

    be described by aquantitative pairwise progression rule: the squares are becoming smaller. One would

    then select the answer that best fit this rule in the bottom row.

    Carpenter, Just, and Shells (1990) first model, FAIRRAVEN, could perform APM about as well as

    the average participant from their subject pool. The second model, BETTERAVEN, performed at the

    level of the best participants. BETTERAVEN performed better due to a couple key changes: 1)

    BETTERAVEN could identify cases where there was an element in only two of the three images in a

    row, meaning it could use thedistribution of two values rule; 2) BETTERAVEN had better goal

    management. It identified candidate rules one at a time, and it backtracked when a candidate rule proved

    ineffective for solving a problem.

    Based on the difference between the models, the experimenters suggested that the ability to manage

    goals is a key factor in solving the hardest problems. They saw this as being linked to working memory: a

    complex problem requires that one manage a hierarchy of goals, and keeping them all in memory can be

    quite difficult.

    In their analysis, the researchers noticed another interesting difference between high- and low-

    performers on the task. In some cases, there was more than one rule that might be applied to solve a

    problem. For example, suppose the three corresponding elements were an arrow pointed up, an arrow

    pointed right, and an arrow pointed down. This set of elements could be seen as either a quantitative

  • 7/27/2019 Lovett Thesis Final

    26/312

    26

    pairwise progression, in which an arrow shape gradually rotates, or a distribution of three, with three

    different shapes. While either of these rules is sufficient, the first rule is better, as it is more compact.

    That is, one doesnt have to remember all three shapes individually. On problems such as these, verbal

    protocols showed that higher performers used the quantitative pairwise progression rule more often than

    lower performers.

    Carpenter et al. suggested the above result occurred simply because higher performers were more

    consistent about looking for the simpler rules before they looked for the more complex rules. However, I

    think it is likely that all participants looked for the simpler rules first. I interpret these results as

    suggesting, rather, that higher performers are better at identifying more abstract relationships between

    elements. Whereas the lower performers saw the arrows as different shapes, the higher performers were

    better able to compare the arrows, perform a spatial transformation, and identify a rotation between the

    arrows shapes. Identifying more abstract relations on problems like these would reduce working

    memory load and generally aid in solving the problems.

    Carpenter et al. failed to address a couple important components of human processing with their

    models. Firstly, the models did not generate representations from the images; all representations were

    hand-coded based on participant descriptions. The experimenters suggested the visual encoding process

    was irrelevant, given how well performance on the task correlates with other, non-visual tasks. However,

    this assumes visual encoding ability does not generalize to encoding ability in other modalities. I believe

    RPM taps into a general ability to encode a stimulus at the appropriate level of abstraction. Indeed, the

    authors themselves suggest that an ability to decrease working memory load by using more abstraction

    representations might explain the correlation between RPM and the Towers of Hanoi problem, a very

    different task

    Secondly, the FAIRAVEN and BETTERAVEN models did not learn the five rules described above.

    Rather, abstract forms of all the rules were hard-coded into the models. Thus, the models fail to explain

  • 7/27/2019 Lovett Thesis Final

    27/312

    27

    how individuals begin with basic comparisons of pairs of images and develop, over time, an

    understanding of a complex rule that holds across a row of images.

    Factors Contributing to Difficulty

    Two groups of researchers (Vodegel-Matzen, van der Molen, & Dudink, 1994; Embretson, 1998)

    have evaluated Carpenter, Just, and Shells (1990) analysis by creating their own, experimental matrix

    items in which they varied the number and type of Carpenter rules. Both groups found that a rule

    complexity measure, based on the number and difficulty of rules, correlated highly with problem

    difficulty: problems with more rules, and with more difficult rules, are harder to solve. This is likely

    because such problems both put a greater load on working memory and require more sophisticated

    problem-solving techniques. I believe a computational model which makes those problem-solving

    techniques explicit is needed to gain a better grasp on what exactly makes more complex problems

    harder.

    Primi (2001) designed new matrix items in which he independently varied the number of elements in

    each image, the number of rules describing a row of images, the complexity of rules, and perceptual

    organization . Perceptual organization referred to how difficult it was to identify the corresponding

    elements in a row of images. In hislow perceptual organization problems, the correspondence-finding

    component was made difficult through misleading cues that encouraged test-takers to put the wrong

    elements into correspondence. Perhaps unsurprisingly, he found that difficulty of correspondence-finding

    was the greatest predictor of problem difficulty.

    The official RPM problems, unlike the problems in Primis test se t, were not explicitly designed to

    make correspondence-finding difficult. Nonetheless, there are some problems which invite inappropriatecorrespondences (Carpenter, Just, & Shell, 1990). In such cases, I suspect the most important skill is an

    ability to backtrack and look for alternate mappings when a set of correspondences prove insufficient for

    solving a problem. Thus, dealing with incorrect correspondences, like dealing with excessive memory

    load, may require an ability to evaluate and dynamically modify problem-solving strategies.

  • 7/27/2019 Lovett Thesis Final

    28/312

    28

    Oddity Task

    Processes

    The oddity task (Figure 1.2B) is a set of 45 problems which Dehaene et al. (2006) constructed to

    evaluate peoples geometric knowledge. The experimenters hypothesized that the problems tap intocore

    knowledge of geometry (Spelke & Kinzler, 2007), an innate, universal cognitive module. This module

    would presumably understand key geometric concepts like perpendicular lines, and would therefore be

    used to solve problems like Figure 1.2B.

    To test the universality of geometric knowledge, Dehaene et al. gave their 45 problems to participants

    of varying ages from two cultural groups: North Americans and the Munduruk, a South American

    indigenous group. They found that the Munduruk performed above chance on nearly all the problems,

    despite their lack of formal schooling in geometry and mathematics. Furthermore, there was a significant

    correlation between the American and Munduruk error patterns. The experimenters took this as

    evidence that the two groups were working with a shared, universal set of core geometric knowledge.

    As this work is relatively new, no other researchers have explored the processes people might utilize

    in performing the task. However, I have suggested people likely use processes similar to those used in the

    previous tasks (Lovett, & Forbus, 2011). Whereas geometric analogy involves comparing two images

    and noticing their differences, the oddity task involves comparing images and noticing their

    commonalities. Later, I will argue that both of these comparisons can best be modeled as structure-

    mapping processes (Gentner, 1983).

    Factors Contributing to Difficulty

    The oddity task problems (Dehaene et al., 2006) were designed to evaluate understanding of two-

    dimensional spatial concepts. As such, they vary greatly in the type and complexity of the spatial

    concepts they test for. Several of the more difficult problems involve rotations or reflections between

  • 7/27/2019 Lovett Thesis Final

    29/312

    29

    shapes. Thus, as with geometric analogy, people may have more difficulty with problems that require

    spatial visualization.

    Importantly, these problems do not merely rely on ones abi lity to mentally rotate shapes. They also

    require that one notice that there is a possible shape transformation, and then put in the effort to compute

    the transformation. Thus, there may be a motivational element. Recall that on the RPM, higher-

    performing participants were more likely to notice transformations like a rotation of an arrow shape, even

    when the transformation wasnt strictly necessary for solv ing the problem. I believe that across all three

    tasks, recognition of spatial transformations depends on: 1) spatial visualization ability, and 2) a general

    intellectual interest in comparing elements and looking for more abstraction relationships between them.

    1.3 Claims

    My thesis rests on two broad sets of claims. The first set is about the representations and processes

    we can use to model human spatial problem-solving. The second set is about what we can learn from

    these models, in terms of the factors that contribute toa problems difficulty and the processes that

    explain variation in human performance.

    Modeling Spatial Problem-Solving

    The studies described above tell us a little about the processes, and less about the representations, that

    people utilize during spatial problem-solving. There are several open questions that must be resolved to

    model the tasks from end to end. I propose the following hypotheses about human representations and

    processes:

    1) When possible, people utilizequalitative representations of the spatial relations between elements

    in an image (e.g., Biederman, 1987; Kosslyn et al., 1989; Forbus, Nielsen, & Faltings, 1991) when

    reasoning about space.

  • 7/27/2019 Lovett Thesis Final

    30/312

    30

    2) These qualitative representations are hierarchical (Palmer, 1977). That is, one can reason about

    relations between objects in an image, or one can focus in and reason about relations between parts of an

    individual object, or one can zoom out and reason about relations between groups of objects.

    3) Qualitative, relational representations are compared via structure-mapping (Gentner, 1983, 1989), a

    domain-general process of structural alignment. Structure-mapping can identify commonalities and

    differences in two representations, or it can determine their similarity.

    4) The computation of spatial transformations, such as rotations between shapes, is accomplished via

    an interaction between hierarchical representations and structure-mapping. To compute a transformation

    between two shapes, an individual uses structure-mapping over representations of the shapes parts to

    identify corresponding parts. Then, a spatial transformation is applied to put the corresponding parts into

    alignment.

    5) Spatial problem-solving requires control processes which can look over a problem, apply a

    strategy where a strategy consists primarily of a set of comparisons via structure-mapping and, when a

    strategy fails, backtrack and attempt a new strategy.

    In the following subsections, I briefly justify each hypothesis and then outline how it can be modeled.

    1) Qualitative Representations of Spatial Relations

    There is good evidence that people are sensitive to the qualitative, or categorical, relations between

    objects in a visual scene. For example, parallel lines (Abravanel, 1977) and concave corners between

    edges of a shape (Bhatt et al., 2006) are particularly salient to people, suggesting that we make a

    qualitative distinction between parallel and non-parallel, or between concave and convex. In some cases,

    people go out of their way to identify a qualitative relation between objects: when people are asked tomemorize the location of a dot in a circle, there is evidence they mentally divide the circle into four

    quadrants and then qualitatively encode which of the four quadrants the dot was located in (Huttenlocher

    et al., 1991).

  • 7/27/2019 Lovett Thesis Final

    31/312

  • 7/27/2019 Lovett Thesis Final

    32/312

  • 7/27/2019 Lovett Thesis Final

    33/312

    33

    correspondences between elements in the two cases; 2) a structural evaluation score , a measure of the

    similarity of the cases based on the systematicity of the mapping; and 3)candidate inferences , inferences

    about the target based on elements in the base that failed to map to the target. Candidate inferences can

    be used to identify differences between the base and the target.

    While structure-mapping was originally proposed to explain abstract analogies, there has been

    mounting evidence that it can also explain peoples concrete comparisons of visual sti muli (Markman &

    Gentner, 1996; Lovett et al., 2009a; Sagi, Gentner, & Lovett, in press). I believe it may play a ubiquitous

    role in spatial problem-solving because of its usefulness at various stages in the problem-solving process.

    For example, the infer-infer-compare strategy for geometric analogy, described above, relies on both

    computing the differences between images and comparing two sets of differences. SME can do both

    these things, as the sets of differences computed by SME can themselves be compared via SME in a

    second-order comparison (Lovett et al., 2009b).

    4) Structure-Mapping in Spatial Transformations

    One of my goals is to better understand how people perform spatial visualization, particularly how

    they compute transformations between shapes. Work onmental rotation (Shepard & Metzler, 1971;

    Shepard & Cooper, 1982), has shown that the time required to determine that one shape is a rotation of

    another is proportional to the degrees of rotation between them. This suggests that people perform this

    task by mentally rotating their representation of one shape to align it with the other. If so, people cannot

    be working solely with a qualitative shape representation, as described above. Qualitative representations

    would not include the absolute orientations of shape parts, so there would be no need to transform them in

    an analog fashion.I believe mental rotation is performed on a quantitative, orientation-specific shape representation.

    While the exact form of this representation is unclear, it can be modeled simply as the set of edges in a

    shape, along with each edges quantitative orientation , length, and location. However, a mystery remains.

    Typically, the time to perform a mental rotation is proportional to the degrees of rotation along the

  • 7/27/2019 Lovett Thesis Final

    34/312

    34

    shortest possible rotation between the two shapes. How do people know which way to rotate one shape to

    quickly align it with the other?

    To answer this question, I have proposed a three-stage model of mental rotation:

    1) Compare qualitative edge-level representations via structure-mapping. Identify the corresponding

    edges.

    2) Take a single pair of corresponding edges and calculate the shortest possible rotation between

    them.

    3) Apply this rotation to the full set of quantitative edges in the first shape. Evaluate whether this

    aligns those edges with the edges in the second shape.

    This approach can identify other transformations, such as reflections, as well. I describe the approach

    in detail in Chapter 3.

    5) Control Processes

    I have argued that structure-mapping can account for much of the processing in spatial problem-

    solving. However, there must also be a control process which chooses a strategy, oversees the application

    of that strategy, and selects a new strategy if the original fails to produce a solution. New strategies may

    be necessary when a particularly complex problem overloads the working memory (Mulholland,

    Pellegrino, & Glaser, 1980; Bethell-Fox, Lohman, & Snow, 1984), or when a tricky problem causes one

    to identify the wrong corresponding elements (Primi, 2001).

  • 7/27/2019 Lovett Thesis Final

    35/312

    35

    A) B)

    Average RT: 6.7 s Average RT: 26.7 s

    Accuracy: 100% Accuracy: 56%

    Figure 1.4 Two related geometric analogy problems, along with average reaction times and

    overall accuracies for human participants.

    For example, consider the geometric analogy problems in Figure 1.4. While these problems are

    nearly identical, the second one is significantly harder (Lovett et al., 2009b) because the obvious mapping

    between images A and B, in which the large triangle in A corresponds to the large triangle in B, is

    incorrect. Thus, the problem requires backtracking and identifying an alternate, less obvious mapping, in

    which the small triangle in A corresponds to the big triangle in B. Clearly, this type of backtracking is

    difficult for people.I have built control processes directly into Spatial Routines for Sketches. A spatial routine can

    include loops that iterate through multiple strategies until a sufficient solution is found a sufficient

    solution may be one whose SME mapping receives a reasonably high score, or one whose SME

    comparison contains no differences. Spatial routines can also contain nested loops, allowing multiple

    combinations of strategies to be attempted. Thus, a spatial routine is much like a computer program

    (Ullman, 1987). Note that I do not model working memory capacity. Because there are nolimits to a models working

    memory, it will never fail to solve a problem because too many things must be kept in memory.

    Similarly, it will never forget the previously completed steps of a problem and need to rework them. I

  • 7/27/2019 Lovett Thesis Final

    36/312

    36

    view the modeling of working memory capacity as future work which can be attempted when spatial

    working memory is better understood.

    Using the Models to Study Human Performance

    Once models of spatial problem-solving have been developed, we can use them to study human

    performance on the tasks, identifying the factors that make one problem easier or harder than another.

    While several other researchers have considered this issue, an end-to-end task model allows us to be

    much more explicit about how a particular factor puts more loadon the models representations and

    processes. The models can also be used to automatically tag each problem with information such as the

    number of elements that must be represented, the number of operations that must be applied to solve the

    problem, etc. Thus, a model can code a problem more objectively than if we simply hand-code each

    problem stimulus based on our best guesses about what the problem requires.

    Once problems are coded for different factors, we can build mathematical models of performance

    matching particular individuals, or groups, in order to determine how individuals vary in their ability to

    handle different sources of difficulty. In this way, we can hopefully develop a better understanding of

    commonalities and differences in peoples spatial reasoning ability.

    In my analysis, I focus on three broad classes of factors: encoding & abstraction, working memory

    load, and control processes. I describe each of these next.

    Encoding & Abstraction

    Encoding & abstraction refers to the ability to encode representations at the appropriate level of

    abstraction for solving a problem. Some problems may be more difficult because they require

    representations that people are less inclined or less able to compute. I am interested in two forms of

    abstraction, which I callentity abstraction and relational abstraction .

    Problems that are high on entity abstraction require the solver to encode elements at a particular level

    in the representational hierarchy. This might be either a larger scale or a smaller scale than the most

  • 7/27/2019 Lovett Thesis Final

    37/312

    37

    comfortable level for an individual. To demonstrate, let us consider a simple form of analogical problem-

    solving: the letter string analogy (Mitchell, 1993; Hofstadter, 1995). The simplest letter string analogy is:

    abc : abd :: rst : ?

    Here, the obvious solution is rsu, based on incrementing the final letter by one. Suppose instead we

    have:

    abc : abd :: rrsstt : ?

    While an obvious solution here would be rrsstu, a better one might be rrssuu. Discovering this

    solution depends on entity abstraction. One must recognize that the appropriate level of abstraction for

    representing the rrsstt letter string is as two-letter groups. The entire tt group in rrsstt maps to the

    letter c in abc.

    On the other hand, problems that are high onrelational abstraction require the solver to notice more

    complex relations between elements. Often, this involves comparing elements within a single image (or

    letter string) to recognize commonalities and differences among them, before one compares across images

    (or letter strings). For example, consider the problem:

    abc : abd :: hkkuuu : ?

    Here, an obvious solution might be to apply the letter successor relation and get hkkuuv or

    hkkvvv. However, if one spends more t ime considering the relationships between the elements in

    hkkuuu, one should recognize that this letter string contains a new kind of successor relationship: group

    length. One might then choose to increment this successor relationship instead, to get hkkuuuu.

    Of course, entity and relational abstraction occur in slightly different forms in spatial problem-

    solving. Entity abstraction is the ability to build a representation at the level of groups of objects, objects,

    or parts of an object, as the problem demands. Relational abstraction is the ability to recognize complex

    relations, particularly spatial transformations such as rotations or reflections. I use the letter string

    examples to illustrate that these abilities are not specific to the domain of spatial problem-solving. Entity

  • 7/27/2019 Lovett Thesis Final

    38/312

    38

    and relational abstraction may rely on a general inclination to try out different abstractions for

    representing stimuli.

    Unfortunately, my analysis is insufficient for distinguishing between theinclination and theability to

    form abstractions. Thus, relational abstraction refers to both an interest in comparison and abstraction,

    and an ability to compute spatial transformations between shapes, i.e., spatial visualization ability.

    The simplest way to determine whether a certain problem requires entity or relational abstraction is

    via ablation. That is, one can temporarily remove the models ability to generate representations at a

    particular level, or to compute spatial transformations, and check whether the model now fails on the

    problem.

    Working Memory Load

    Working memory load is simply the number of things that must be kept in mind when solving a

    problem. Several studies have suggested that people are slower and less accurate to solve problems that

    involve more elements or more transformations between elements (e.g., Mulholland, Pellegrino, &

    Glaser, 1980; Bethell-Fox, Lohman, & Snow, 1984; Embretson, 1998; Vodegel-Matzen, van der Molen,

    & Dudink, 1994). One advantage of computational models is that they can automatically code the

    number of elements, number of relations, number of differences between representations, etc, that are

    used to solve a problem. Thus, we can easily code a problem for working memory load. We can evaluate

    to what extent these different factors (number of elements, number of relations, etc) place a load on

    working memory by considering which factors correlate with problem difficulty.

    Control Processes

    The articles cited above argue that more complex problems are harder only because of increased

    working memory load. However, an alternate explanation is that, in some cases, more complex problems

    require more careful control over problem-solving strategies. For example, in Ravens Progressive

    Matrices, as the number of elements increases, the problem of identifying the corresponding elements

  • 7/27/2019 Lovett Thesis Final

    39/312

    39

    becomes more difficult. Thus, one is more likely to identify incorrect corresponding elements on the first

    try, fail to solve the problem, and require backtracking.

    Carpenter, Just, and Shell (1990) have referred to this problem as goal management and have

    suggested that goal management itself boils down to working memory, since more complex problems

    require keeping a larger hierarchy of goals in memory. However, I believe we should consider control

    processes as a separate factor, particularly as the computational models allow us to directly code the

    number of operations and the amount of backtracking required to solve a problem. Thus, we can look at

    whether this factor explains any of the variance in problem difficulty, beyond what working memory load

    explains.

    1.4 Contributions

    My thesis consists of five contributions. Firstly, the Perceptual Sketchpad is an extension to the

    CogSketch sketch understanding system. Given a set of objects drawn in CogSketch, the Perceptual

    Sketchpad automatically segments each object into its edges and groups objects together based on

    similarity. It generates human-like qualitative representations at three hierarchical levels: groups, objects,

    and edges.

    Secondly, Spatial Routines for Sketches (SRS)is a modeling framework inspired by Ullmans (1987)

    visual routines . As with visual routines, SRS utilizes a set of cognitive operations that we know people

    can accomplish. These include perceptual encoding and comparison via SME. The operations can be

    combined to create a spatial routine for performing some spatial task. For example, a spatial routine

    could describe one strategy for solving geometric analogy problems. A spatial routine is analogous to a

    computer program: each operation takes an input, processes it, and produces an output.

    SRS allows researchers to construct cognitive models of human performance on a task. Once a

    researcher has written a routine, SRS implements its operations, allowing the routine to be tested on

  • 7/27/2019 Lovett Thesis Final

    40/312

  • 7/27/2019 Lovett Thesis Final

    41/312

    41

    for human accuracy and reaction times. In this way, the models can help us better understand what makes

    a problem easy or difficult for people.

    Finally, for the oddity task only, I have built separate linear models for the different cultural and age

    groups. These models highlight some commonalities and differences in the representations and processes

    people use to reason about space.

    1.6 Outline of the Thesis

    In the following two chapters, I consider the key problems of visual encoding and comparison. I

    begin by outlining the psychological evidence for hierarchical hybrid representations , containing both

    qualitative and quantitative features at multiple levels of abstraction. I show how even basic visual

    comparisons require a strategic search through this hierarchy. I then describe my implementations of

    perceptual encoding, image comparison, and shape comparison.

    In Chapter 4, I present Spatial Routines for Sketches (SRS), a general framework for modeling spatial

    problem-solving. I enumerate its key operations, including perceptual encoding, visual comparison, and

    visual inference.

    In Chapter 5, I give an introduction to cognitive modeling with SRS, discussing the general principles

    that go into all my task models.

    In Chapters 6-8, I describe the task models for geometric analogy, Ravens Progressive Matrices, and

    the oddity task. I present results comparing each model against human performance.

    In Chapter 9, I discuss predictions that follow from my models. Several hypotheses and assumptions

    went into the production of each model, and these can be explicitly tested in future psychological studies.

    Finally, in Chapter 10, I draw conclusions, discuss related work, and consider directions for future

    research.

  • 7/27/2019 Lovett Thesis Final

    42/312

    42

    2. Background

    Spatial problem-solving depends on the ability to compare images and identify similarities and

    differences. This is difficult because even simple images contain many features, of which only a few are

    critical to a given comparison. For example, Figure 2.1A displays an oddity task problem, in which one

    must choose the image that doesnt belong. Each image contains four circles. The circles possess various

    features (e.g., roundness), and there are several spatial relations between the circles. On the other hand,

    each image also contains a row of dots. In five of the images, there is only a single row, whereas the last

    image contains a row plus an additional outlying dot. Thus, it is much easier to pick the odd image out if

    we view them as rows then if we view them as individual circles. However, we have no way of knowing

    a priori what features will be important when we consider a problem.

    A) B)

    Figure 2.1 Two oddity task problem from Dehaene et al. (2006). Pick the image that doesnt

    belong.

    To solve these problems, one must flexibly move between candidate representations until a

    comparisons critical features are discovered. Here, I argue that human perceptual systems produce

    hierarchical hybrid representations (HHRs) of space. These representations are hierarchical in that a

    given stimulus can be encoded at several levels of abstraction. They are hybrid in that, at each level, there

    are at least two types of representations: qualitative and quantitative. For example, consider the edges of

    the objects in Figure 2.1B. These edges possess several quantitative features, such as length and

  • 7/27/2019 Lovett Thesis Final

    43/312

    43

    orientation, that are irrelevant to the problem. However, if we focus on thequalitative relations between

    edges, we see that in all images except one, there are two pairs of parallel edges.

    Psychological evidence suggests hierarchical representations are explored top-down (Hochstein &

    Ahissar, 2002). That is, an individual will first compare two stimuli at a high level of abstraction and

    then move down as needed. In contrast, the temporal interaction between qualitative and quantitative

    representations is less clear (Bornstein & Korda, 1984; Kosslyn et al., 1977). However, in both cases,

    individuals may strategically reprioritize representations, depending on the task demands.

    Top-down comparison of hierarchical representations may depend on identifying corresponding

    elements in structural descriptions. I will argue that structure-mapping theory (Gentner, 1983, 1989)

    provides a parsimonious explanation of how people align representations during visual comparison.

    I begin by presenting psychological evidence for hybrid and for hierarchical representations. I then

    discuss structure-mapping as a mechanism for visual comparison over HHRs. Afterwards, I apply HHRs

    to mental rotation, a commonly studied comparison task. Finally, I consider how individuals strategically

    search through HHRs during comparison.

    2.1 Hybrid RepresentationsEvidence for hybrid representations comes from at least three bodies of research: categorical

    perception, object recognition, and visual priming. I consider each of these in turn. Many of the

    experiments below use the same/different paradigm. In visual same/different experiments (Farell, 1985),

    participants are shown two images, either sequentially or simultaneously, and asked whether they are the

    same or different. By varying the type and degree of differences, researchers can study the form of visual

    representations. Furthermore, by varying the exposure time and the interval between stimuli, researchers

    can potentially isolate the contributions of encoding, memory, and comparison processes. As we shall

    see, same/different experiments provide evidence for at least two distinct representations: qualitative and

    quantitative.

  • 7/27/2019 Lovett Thesis Final

    44/312

    44

    Categorical Perception

    Categorical perception is our ability to use qualitative categories when distinguishing between

    quantitative values. In categorical perception experiments, participants compare stimuli that typicallyvary along a single dimension (e.g., color: Bornstein & Korda, 1984; size: Kosslyn et al., 1977; or

    location: Maki, 1982). A common finding is that the more distant the stimuli are along that dimension,

    the more quickly an individual can recognize that they are different, or that one is greater than the other.

    However, when the stimuli have different categorical values, performance improves significantly.

    For example, Bornstein and Korda (1984) showed participants color patches in a sequential

    same/different task. They used seven patches that varied from blue to green, such that the middle patch(4) was ambiguous in color. Thus, they could independently vary the quantitative difference and the

    qualitative difference (same-color vs. different-color) between patches. They found that for pairs with a

    distance of two, participants responded different faster when the patches were different colors than

    when they were the same color, suggesting that participants used the color labels when comparing.

    However, participants could also distinguish between different patches with the same label, indicating

    they had access to the quantitative hue value. Bornstein and Korda theorized that participants werecomparing the quantitative and qualitative color values in parallel, and responding as soon as either

    comparison returned a difference.

    Categorical perception has been studied extensively in the color domain (e.g., Bornstein & Korda,

    1984; Roberson, Pak, & Hanley, 2008; Regier & Kay, 2009). However, it has also been found in other,

    more spatial domains. Kosslyn et al. (1977) taught participants to draw 6 stickmen of varying sizes and

    also trained them to characterize the three largest as large and the three smallest as small. Similarly,

    Maki (1982) taught participants the locations of twelve cities on a map, in which six cities were in a

    northern state and six were in a southern state. Afterwards, participants were given the names of stimulus

    pairs and asked to judge which was larger or farther north. Both researchers found that participants

  • 7/27/2019 Lovett Thesis Final

    45/312

    45

    responded faster when the stimuli were farther apart along the relevant dimension. However, participants

    also responded faster to stimuli on opposite sides of the category boundary they had learned.

    Categorical perception has also been found for spatial relations between an objects parts. Rosielle

    and Cooper (2001) used line-drawings of 3-D objects in a sequential same/different study. Each object

    had two parts connected by an arm. The arms orientation varied along four values: 0 , 30, 60, and 90.

    For the first value (0), participants might recognize a qualitative parallel relationship between the

    parts. For the last value (90), participants might recognize a perpendicular relationship. However,

    either of the middle two values would likely be labeled as oblique. Rosielle and Cooper compared

    performance across the conditions where the two objects had identical parts but their angles differed by

    30. They found that participants responded slowest (and error rates were highest) when the object

    changed from one oblique angle to the other. Participants performed better when the object changed to or

    from a parallel or perpendicular angle, suggesting they were encoding and using these qualitative spatial

    relations.

    Finally, Ferguson, Aminoff, and Gentner (1996) showed that categorical perception can play a role

    even when perceiving a single object. They showed participants 2-D polygons and asked them to

    evaluate whether each polygon possessed an axis of symmetry. This is essentially a same-different task

    in which the participant is comparing one half of a shape to the other. The experimenters looked at the

    time and accuracy to detect purely quantitative asymmetries vs. qualitative asymmetries, where the latter

    included changes in the number of vertices and changes between concave and convex vertices. The

    experimenters attempted to keep the degree of quantitative asymmetry constant while varying the

    presence of qualitative asymmetries. They found that participants were better able to detect qualitative

    asymmetries, suggesting again that participants use qualitative spatial relations during comparison.

    These studies indicate that participants consider both quantitative and qualitative features during

    visual comparison. Features may be purely visual (i.e., color) or spatial. Spatial features may include

    attributes of individual objects or relations between elements (i.e., relative orientation, convex vs.

  • 7/27/2019 Lovett Thesis Final

    46/312

    46

    concave). It is less clear whether individuals prefer one type or the other. Some researchers (Bornstein &

    Korda, 1984; Kosslyn et al., 1977) have suggested that qualitative and quantitative features are compared

    in parallel, so that a large difference along either dimension produces a fast response. We shall return to

    this question later.

    Object Recognition

    A heated debate in the field of object recognition provides further evidence. Here, the question is

    how people rapidly recognize objects from different viewpoints. Biederman and colleagues (Biederman,

    1987; Hummel & Biederman, 1992) have proposed that people represent an object as a set of shape

    primitives, called geons , with qualitative spatial relations between the geons. Geons and relations are

    identified based onnon-accidental properties , properties of the image that are unlikely to be tied to a

    specific viewpoint. For example, if two edges along a shapes surface appear parallel from one

    viewpoint, they will also appear parallel from most other viewpoints. Thus, a given object should

    produce a similar geon representation from many different viewpoints, facilitating the recognition

    process.

    A set of geons, features of the geons, and spatial relations between geons, make a geon structural

    description (GSD). This approach aligns with the qualitative representations described above in that

    GSDs capture qualitative features of, and relations between, the parts of an object. It also provides some

    pointers to what qualitative features might be encoded those features which are unlikely to have

    occurred accidentally, due to viewing a scene from a particular viewpoint.

    Biederman and colleagues predictthat any rotation which does not alter an objects GSD should not

    interfere with ones ability to recognize it. They have shown (Biederman & Gerhardstein, 1993) that

    participants can quickly recognize two objects as the same, regardless of rotation in depth, provided they

    generate the same GSD. Similarly (Biederman & Bar, 1999), participants can quickly recognize that two

    objects are different, regardless of rotation in depth, when they generate different GSDs, i.e., they differ in

  • 7/27/2019 Lovett Thesis Final

    47/312

    47

    a non-accidental property. However, when two objects differ only in a quantitative property, recognizing

    they are different over rotations is difficult. This fits well with the above finding (Rosielle & Cooper,

    2001) that participants can better tell two objects are different when the change in their angles is

    qualitative, rather than quantitative.

    However, these results have not gone unchallenged. Tarr and colleagues (Tarr et al., 1997; Tarr et al.,

    1998; Hayward & Tarr, 2000) have argued that we use viewpoint-dependent representations to recognize

    objects. They claim that we deal with multiple viewpoints by learning several different representations

    for a given object, and by mentally transforming between a novel view and the closest known view.

    Thus, they predict that when an object is unfamiliar, there should be a cost for recognizing it from a novel

    viewpoint. In support of this, their studies have shown a cost for recognizing objects from new

    viewpoints, even when the GSDs are mostly identical. These studies used the same paradigms (Tarr et

    al., 1997) and sometimes even the same stimuli (Tarr et al., 1998) as Biederman and colleagues. Thus,

    apparently very subtle differences can shift participants between representations that are more or less

    viewpoint-specific.

    Biederman and Bar (1999) suggest one factor is the accessibility of the geons in a 3-D shape. Even

    differences to a single vertex can make it harder to recognize a rotated version of an object (Biederman &

    Bar, 1998). Of course, this means the viewpoint-invariant representation is quite brittle. If it is so easily

    disrupted, then one cannot expect two viewpoints of the same object to produce identical representations

    consistently. On the other hand, viewpoint-invariant recognition is achievable in some situations

    (Biederman & Gerhardstein, 1993). Thus, the results suggest: a) we can access a viewpoint-invariant

    representation, such as GSD; b) this representation can be used to quickly compare two object images; c)

    when this representation is insufficient, we must fall back on viewpoint-specific information to perform

    the comparison. Here, again, we see a hybrid perceptual representation. One component is qualitative

    and primarily viewpoint-invariant. The other contains quantitative and viewpoint-specific information.

  • 7/27/2019 Lovett Thesis Final

    48/312

    48

    Visual Priming

    Further support for hybrid representations comes from visual priming studies. Hummel and

    colleagues (Stankiewicz, Hummel, & Cooper, 1998; Stankiewicz & Hummel, 2002; Thoma, Hummel, &Davidoff, 2004) have argued that a viewpoint-invariant representation such as GSD can only be generated

    when one attends to an image. Attention allows one to identify the parts of an object and bind them to

    features and relations (Treisman & Gelade, 1980), thus generating a structural description. If an

    individual is exposed to an image but fails to attend to it, they will generate a more holistic, viewpoint-

    specific representation.

    In their experiments, two objects are displayed on the screen together. Participants are cued to attendto one, while they are given very little time to notice the other typically, the two objects together are

    displayed for only 120 ms. Afterwards, participants are shown another object and asked to name it. The

    experimenters predict that if an individual attends to an object, they will generate a viewpoint-invariant

    representation that will prime them for naming that object again, even if the object is transformed.

    However, if an individual doesnt attend to an object, they will generate only a viewpoint -specific

    representation. This representation will support priming across a much narrower set of transformations.Hummel and colleagues found that when participants attended to an object, they were primed to

    recognize it again, even if the original object was mirror-reflected (Stankiewicz, Hummel, & Cooper,

    1998), or scrambled (Thoma, Hummel, & Davidoff, 2004) by splitting the image into two halves and

    inverting them. Importantly, the priming for both mirror-reflected and scrambled images was greater than

    the priming for seeing a different object from the same class (e.g., two different pianos), so this was not

    merely semantic priming.In contrast, an ignored image failed to prime mirror-reflected or unscrambled versions of itself,

    although it did prime an identical version, as well as larger or smaller versions (Stankiewicz & Hummel,

    2002). Curiously, a scrambled, ignored image failed to prime even an identical scrambled image (Thoma,

    Hummel, & Davidoff, 2004). This, along with the size invariance, suggests that the ignored image does

  • 7/27/2019 Lovett Thesis Final

    49/312

    49

    not simply produce a pixel-by-pixel representation. Some amount of abstraction or recognition appears to

    occur even for ignored images, although it is insufficient for supporting transformations such as mirror

    reflection.

    Characterizing Qualitative Representations

    We have now seen strong evidence for a qualitative/quantitative distinction in visual representations,

    as well as reasonably strong evidence for a viewpoint-invariant/viewpoint-specific distinction. However,

    two questions remain: 1) Are the qualitative and the quantitative simply two sets of features in a single

    representation, or are there real differences in the ways they are represented and reasoned over? 2) How

    closely tied together are the qualitative representation and the viewpoint-invariant representation? To

    better answer these questions, I will now make some claims about the nature of qualitative

    representations.

    1) Qualitative representations appear linked to symbolic processing in the brain

    Qualitative features (e.g., blue, above, or parallel) often correspond to single words, whereas

    quantitative features (e.g., 3 inches to the right) often do not. Therefore, if qualitative and quantitative

    information are represented separately in the brain, one might suppose that the left hemisphere, known for

    language-processing (at least in right-handed ind