Upload
karmacoma156
View
229
Download
0
Embed Size (px)
Citation preview
7/27/2019 Lovett Thesis Final
1/312
7/27/2019 Lovett Thesis Final
2/312
2
Copyright by Andrew Lovett 2012All Rights Reserved
7/27/2019 Lovett Thesis Final
3/312
3
ABSTRACT
Spatial Routines for Sketches: A Framework for Modeling Spatial Problem-Solving
Andrew Lovett
Spatial problem-solving tasks are often used to evaluate peoples cognitive abilities. For example,
Ravens Progressive Matrices is a popular intelligence test. In it, an individual is shown an array of two-
dimensional images, with one image missing. The individual must compare the images and identify a
pattern of differences between them, in order to solve for the missing image. Performance on tasks such
as Ravens and geometric analogy (A is to B as C is to..?) correlates strongly with performance on
many other ability tasks, in the spatial, verbal, and mathematical domains. Thus, these tasks appear to
depend on core, general-purpose representations and processes. However, it is as yet unclear what those
representations and processes are.
To better understand these tasks, we developed Spatial Routines for Sketches (SRS), a general
framework for modeling spatial problem-solving. SRS is based on a set of psychological claims about
how people perform spatial problem-solving: 1) When possible, people use qualitative representations
describing features such as relative position or orientation, rather than exact numerical values. 2) Spatial
representations are hierarchical. A given image might be represented as object groups, individual objects,
or the parts within each object. 3) Qualitative spatial representations can be compared via structure-
mapping. Structure-mapping involves aligning the relational structure in two representations to find the
corresponding elements.
Three task models were built within the SRS framework : geometric analogy, Ravens Progressive
Matrices, and the oddity task, in which one sees a set of images and picks the one that is different. The
three task models use identical representations and similar processes. Thus, they allow us to test the
generality of the psychological claims, as well as the representations and processes that implement these
claims.
7/27/2019 Lovett Thesis Final
4/312
7/27/2019 Lovett Thesis Final
5/312
7/27/2019 Lovett Thesis Final
6/312
6
TABLE OF CONTENTS
1. Introduction ............................................................................................................................................16
1.1 Motivation ......................................................................................................................................19
1.2 Background ....................................................................................................................................21
1.3 Claims .............................................................................................................................................29
1.4 Contributions ..................................................................................................................................39
1.5 Evaluation .......................................................................................................................................40
1.6 Outline of the Thesis ......................................................................................................................41
2. Background ............................................................................................................................................42
2.1 Hybrid Representations ..................................................................................................................43
2.2 Hierarchical Representations ..........................................................................................................55
2.3 Structural Comparison ....................................................................................................................61
2.4 Mental Rotation: An Example Domain ..........................................................................................62
2.3 Strategic Comparison .....................................................................................................................68
3. Modeling Perception and Comparison ...................................................................................................72
3.1 Existing Models ..............................................................................................................................73
3.2 Perceptual Sketchpad......................................................................................................................77
3.3 Image Comparison .........................................................................................................................91
3.4 Shape Comparison ..........................................................................................................................97
3.5 Perceptual Reorganization ............................................................................................................109
4. Spatial Routines for Sketches ..............................................................................................................112
4.1 Operation Categories ....................................................................................................................114
4.2 Spatial Operations ........................................................................................................................118
4.3 Other Operations ..........................................................................................................................153
7/27/2019 Lovett Thesis Final
7/312
7
4.4 Spatial Routine Language .............................................................................................................154
4.5 Gathering Data .............................................................................................................................166
5. Simulations ..........................................................................................................................................169
5.1 Representation ..............................................................................................................................170
5.2 Task Comparison ..........................................................................................................................173
5.3 Parameters and Sensitivity Analysis ............................................................................................174
5.4 Analyses .......................................................................................................................................175
6. Geometric Analogy ..............................................................................................................................177
6.1 Background ..................................................................................................................................178
6.2 Model............................................................................................................................................181
6.3 Model Predictions .........................................................................................................................189
6.4 Behavioral Experiment .................................................................................................................190
6.5 Simulation ....................................................................................................................................200
6.6 Related Work ................................................................................................................................205
6.7 Conclusion ....................................................................................................................................207
7. Ravens Progressive Matrices ..............................................................................................................209
7.1 Background ..................................................................................................................................210
7.2 Model............................................................................................................................................215
7.3 Behavioral Experiment .................................................................................................................225
7.4 Simulation ....................................................................................................................................228
7.5 Discussion ....................................................................................................................................235
7.6 Related Work ................................................................................................................................238
8. The Oddity Task ..................................................................................................................................241
8.1 Background ..................................................................................................................................242
8.2 Model............................................................................................................................................243
7/27/2019 Lovett Thesis Final
8/312
7/27/2019 Lovett Thesis Final
9/312
9
LIST OF FIGURES
Figure 1.1: Geometric analogy problem .....................................................................................................16
Figure 1.2: Three spatial problem-solving tasks .........................................................................................18
Figure 1.3: An oddity task problem involving groups of objects ...............................................................32
Figure 1.4: Two related geometric analogy problems, along with average reaction times and overallaccuracies for human participants ...............................................................................................................35
Figure 2.1: Two oddity task problems from Dehaene et al. (2006). Pick the image that doesnt belong ... 42
Figure 2.2: Stimuli from (A) Navon (1977, Experiment 4), and (B) Love, Rouder, and Wisniewski (1999)
(B) ...............................................................................................................................................................58
Figure 2.3: Two images that one might compare ........................................................................................60
Figure 2.4: Mental rotation stimuli from (A) Shepard and Metzler (1971), and (B) Cooper and Shepard(1973) ..........................................................................................................................................................62
Figure 2.5: Reaction time results from Biederman and Bar (1999, Experiment 2) ...................................67
Figure 3.1: Example stimuli from Love, Rouder, and Wisniewski (1999) .................................................72
Figure 3.2: The CogSketch sketch understanding system ..........................................................................74
Figure 3.3: Sketch of a house ......................................................................................................................78
Figure 3.4: Examples of texture patches .....................................................................................................79
Figure 3.5: Examples of space objects ........................................................................................................79
Figure 3.6: Grouping examples ...................................................................................................................80
Figure 3.7: Examples of parallel edges, parallel objects, and parallel groups ............................................82
Figure 3.8: Example representation with all expressions about Edge-4 in a parallelogram .......................85
Figure 3.9: A rotation between two triangles ..............................................................................................85
Figure 3.10: Two images that might be compared, with the objects labeled ..............................................92
7/27/2019 Lovett Thesis Final
10/312
10
Figure 3.11: Two ambiguous mappings ......................................................................................................94
Figure 3.12: Object pairs whose textures would be considered identical ...................................................95
Figure 3.13: Shape change examples ..........................................................................................................96
Figure 3.14: Shape rotation example ..........................................................................................................98
Figure 3.15: A) Spatial reflections. B) Shape reflections .........................................................................101
Figure 3.16: Examples of lengthening ......................................................................................................104
Figure 3.17: Example of part lengthening ................................................................................................104
Figure 3.18: Examples of part addition .....................................................................................................105
Figure 3.19: Example of a subshape deformation .....................................................................................105
Figure 3.20: Corresponding edges for a symmetry mapping ....................................................................106
Figure 3.21: Example of a scaled group ...................................................................................................107
Figure 3.22: Complex reorganization facilitates this comparison ............................................................111
Figure 4.1: An ambiguous image ..............................................................................................................113
Figure 4.2:A: Oddity task problem from (Dahaene et al., 2006). B: Ravens Matrix -type problem .......115
Figure 4.3: Geometric analogy problems ..................................................................................................117
Figure 4.4: A geometric analogy problem, as it looks in CogSketch .......................................................120
Figure 4.5:A Ravens Progressive Matrices A problem (not from the actual test) ...............................120
Figure 4.6: A row of images for difference-finding ..................................................................................129
Figure 4.7:Ravens Progressive Matrix problems requiring a literal pattern of variance ........................131
Figure 4.8: A row of images with first-to-last matches ............................................................................135
Figure 4.9: Oddity task problems with orientation differences .................................................................136
Figure 4.10: A: A geometric analogy problem. B: The patterns of variance ............................................137
7/27/2019 Lovett Thesis Final
11/312
11
Figure 4.11: Examples of Infer-Shape with reflections ............................................................................140
Figure 4.12: Examples of Infer-Shape with deformations ........................................................................141
Figure 4.13: Examples of Infer-Shapes with texture transformations ......................................................141
Figure 4.14: A geometric analogy problem ..............................................................................................142
Figure 4.15: A geometric analogy problem with an added object ............................................................143
Figure 4.16:Four 3x3 Ravens Matrix prob lems ......................................................................................145
Figure 4.17: A geometric analogy problem with an unclear rotation .......................................................148
Figure 4.18:A Ravens Matrix problem requiring complex perceptual reorganization ...........................149
Figure 4.19:A Ravens Matrix problem requiring texture detection ........................................................150
Figure 4.20: A Bongard problem from (Bongard, 1970) ..........................................................................163
Figure 4.21: The Routine Inspector on a geometric analogy problem ......................................................167
Figure 5.1: A: Geometric analogy problem (Evans, 1968). B: Ravens Matrix problem. C: Oddity task problem (Dehaene et al., 2006) .................................................................................................................169
Figure 5.2: Each internal edge of a textured object was added as a separate object .................................172
Figure 6.1: A geometric analogy problem ................................................................................................178
Figure 6.2: Image pairs requiring perceptual reorganization ....................................................................182
Figure 6.3: Strategy for finding differences between two images (see Appendix B for its implementationin the Spatial Routine language) ...............................................................................................................184
Figure 6.4: Geometric analogy problems requiring second-order comparison .........................................187
Figure 6.5: The answer chosen depends on whether one prefers canonical reflections or rotations ........189Figure 6.6: Reflecting the B shape will produce an identical B shape, as in Figure 6.4B .................190
Figure 6.7: The dot to be removed is in different locations in A and C ....................................................190
Figure 6.8: Problems 1-3 (times are seconds required for human participants to pick an answer; values below answers are the percentage of participants who picked each answer) ............................................194
7/27/2019 Lovett Thesis Final
12/312
12
Figure 6.9: Problems 4-6 ...........................................................................................................................195
Figure 6.10: Problems 7-9 .........................................................................................................................196
Figure 6.11: Problems 10-12 .....................................................................................................................197
Figure 6.12: Problems 13-15 .....................................................................................................................198
Figure 6.13: Problems 16-18 .....................................................................................................................199
Figure 6.14: Problems 19-20 .....................................................................................................................200
Figure 7.1: A matrix problem ...................................................................................................................209
Figure 7.2: Example problems for various Carpenter et al. (1990) rules ..................................................211
Figure 7.3: This matrix problem requires complex perceptual reorganization to solve............................214
Figure 7.4: Two 1x1 matrix problems ......................................................................................................217
Figure 7.5: Solution strategies for the above problems .............................................................................217
Figure 7.6: Problems where computing the pattern of variance may be difficult .....................................219
Figure 7.7: Subroutine for finding differences in a row (see Appendix C for its implementation in theSpatial Routine language). Continued in Figure 7.8 ................................................................................221
Figure 7.8: Subroutine for finding differences in a row. Continued from Figure 7.7 ..............................222
Figure 7.9: Different ways two textures might overlap ............................................................................230
Figure 8.1: Oddity task problem from Dehaene et al. (2006). The image without parallel lines is the oddone out .......................................................................................................................................................242
Figure 8.2: C and D can only be solved by considering edges .................................................................244
Figure 8.3: Certain information can be filtered out...................................................................................245Figure 8.4: B and C require a different similarity measure .......................................................................246
Figure 8.5: One of the six problems the model failed to solve. Average human performance: 68%(American adults), 86% (Munduruk) ......................................................................................................250
Figure 8.6: Problems with closed and open shapes...................................................................................251
7/27/2019 Lovett Thesis Final
13/312
13
Figure 8.7: Problems relying on (A) shape transformation and (B) shape symmetry ..............................251
Figure 8.8: Problems requiring shape comparison between images .........................................................256
Figure 8.9: Problems involving quantitative features ...............................................................................257
Figure 8.10: Problem in which the quantitative difference is particularly salient ....................................258
Figure 9.1: Two image pairs. It is harder to align the same-shaped objects in B than in A.....................260
Figure 9.2: Can you spot the difference between each image pair? ..........................................................261
Figure 9.3: Can you spot the difference between the more complex image pairs? ...................................261
Figure 9.4: Object-level differences (A, B) may be more salient than edge-level differences (C) ...........262
Figure 9.5: Problems involving shape rotations ........................................................................................264
Figure 9.6: An oddity task problem (A) and a Ravens matrix problem (B) with identi cal images .........264
Figure 9.7: This geometric analogy problem requires quantitative information to solve .........................265
Figure 9.8: This geometric analogy problem presents a novel challenge .................................................265
Figure A.1: Two images that might be compared .....................................................................................289
7/27/2019 Lovett Thesis Final
14/312
14
LIST OF TABLES
Table 3.1: Orientation-invariant qualitative vocabulary for edges ............................................................84
Table 3.2: Additional orientation-specific vocabulary for edges ................................................................87
Table 3.3: Object-level qualitative vocabulary. Terms marked with anO are orientation-specific .........88
Table 3.4: Candidate inferences between images 10A and 10B .................................................................92
Table 3.5: Types of shape comparisons ...................................................................................................102
Table 4.1: Pattern of variance for the first two images in Figure 4.6 ........................................................129
Table 4.2: Generalized forms for predicates in difference representations ...............................................138
Table 4.3: A template and examples of operation calls ............................................................................155
Table 4.4: A template and examples of control structures ........................................................................160
Table 4.5: A spatial routine for solving Bongard problems ......................................................................164
Table 4.6: Pattern of variance for Figure 4.20 ..........................................................................................165
Table 6.1: Linear model for human reaction times on geometric analogy ...............................................204
Table 7.1: Linear regression for accuracy of Northwestern students ........................................................233
Table 7.2: Second linear model for accuracy of Northwestern students ...................................................234
Table 7.3: Third linear model for accuracy of Northwestern students......................................................235
Table 7.4: Linear model for reaction times of Northwestern students (in seconds) .................................235
Table 8.1: Accuracy of the model and each participant group on the 45 oddity task problems ...............248
Table 8.2: Correlations in accuracy on each of the 45 problems (Pearsons r) .........................................249
Table 8.3: Rankings of the 6 problems the model failed to solve (1 = easiest, 45 = hardest) ...................249
Table 8.4:Linear models for each groups accuracy ................................................................................253
Table 8.5:Linear models for each groups accuracy, with Elems2 ..........................................................253
7/27/2019 Lovett Thesis Final
15/312
15
Table 8.6: Linear models for each groups reaction times (in s) ..............................................................254
7/27/2019 Lovett Thesis Final
16/312
16
1. Introduction
Spatial problem-solving tasks are a popular toolfor evaluating peoples cognitive abilities. For
example, in geometric analogy (Figure 1.1), an individualis shown an array of images and asked A is to
B as C is to? Like all the tasks I am studying, geometric analogy requires: 1) building up
representations of two-dimensional images; 2) comparing those representations; and 3) identifying a
pattern across them. Thus, these tasks depend critically onones ability to encode spatial relations within
an image, compare images, and abstract out higher-order relations between then based on what is
common or different.
Figure 1.1 Geometric Analogy problem.
In the past, spatial problem-solving has been used to evaluate various abilities, from geometric
knowledge (Dehaene et al., 2006) to general intelligence (Raven, Raven, & Court, 1998). However, there
is disagreement about what exactly a particular task evaluates (e.g., geometric analogy: Sternberg, 1977;
Mulholland, Pellegrino, & Glaser, 1980; Raven's Progressive Matrices: Carpenter, Just, & Shell, 1990;
Primi, 2001). I believe we need a more concrete understanding of the representations and processes
people use to solve these tasks. Applied to a single task, this might illuminate the abilities that separate
one persons performance from anothers. Applied across tasks, it should help explain how people reason
about space and spatial relations.
7/27/2019 Lovett Thesis Final
17/312
17
For my thesis, I have constructed Spatial Routines for Sketches (SRS), a general framework for
building, evaluating, and comparing spatial problem-solving task models. Bytask model , I mean an end-
to-end model of human performance on a task, which begins with visual input, generates a representation,
reasons over the representation, and chooses an output behavior. While the SRS models may vary in their
specific strategies, they are all built upon three core hypotheses:
1) When possible, people usequalitative, or categorical, representations to reason about space (e.g.,
Biederman, 1987; Kosslyn et al., 1989; Forbus, Nielsen, & Faltings, 1991). In particular, they encode the
qualitative spatial relations between elements in a visual scene.
2) Qualitative, structural representations of space are compared via structure-mapping (Gentner,
1983). According to structure-mapping theory, people compare two cases by aligning their common
relational structure, thereby highlighting commonalities and differences across the cases.
3) Spatial problem-solving requires flexibly moving between different levels in a hierarchy of spatial
representations (Palmer, 1977) until a suitable level is found. A level of representation is suitable if,
when the representations are compared, a pattern emerges which can be used to solve the problem. For
example, Figure 1.1 involves the spatial relations between the objects in each image. However, other
problems might require focusing on the edges in a single object, or backing out and considering groups of
objects.
I have two primary goals in modeling spatial problem-solving. The first is to evaluate whether the
above hypotheses are sufficient for explaining human performance. A sufficient task model should meet
two criteria: 1) The model should perform the task with a high degree of accuracy, making no more errors
than a typical human would. 2) When the model does fail, its error patterns should match human error
patterns, i.e., problems that are hard for the model should also be hard for people.
7/27/2019 Lovett Thesis Final
18/312
7/27/2019 Lovett Thesis Final
19/312
7/27/2019 Lovett Thesis Final
20/312
20
occupations require that individuals be skilled at mentally manipulating representations of images. For
example, when a surgeon is operating, they may be required to mentally construct a three-dimensional
structure from a two-dimensional image.
Importantly, real-life spatial visualization ability can be evaluated by abstract problems like paper-
folding. Several studies have found correlations between performance on abstract tasks and real-world
ability in surgeons (see Hegarty et al., 2007, for a review). Shea, Lubinski, & Benbow (2001) found that
performance of 13-year-olds on spatial visualization tasks predicted what they would study in college and
what eventual job they would get, even controlling for verbal and mathematical ability. Individuals with
higher spatial ability were more likely to be scientists and engineers. This analysis was restricted to the
top 1% of 13-year-olds, but ongoing research (Hedges & Chung, in preparation) suggests the findings
generalize to the rest of the population.
Thus, it appears that abstract, spatial problem-solving tasks tap into a spatial visualization ability that
is useful in many advanced disciplines. If we can understand how people perform spatial transformations
in these tasks, we will be better prepared to educate students in spatial skills, and hopefully enhance their
ability to perform in those disciplines.
General Problem-Solving Ability
Many problem-solving tasks in the spatial, verbal, and mathematical domains have been used to
evaluate peoples abilit ies. In the past, correlations in performance across these tasks caused Spearman
(1923; 1927) to put forward the notion of g , a single, general intelligence measure which could predict an
individuals ability across a large set of task domains. While this thesis is not directly concerned with g,
Spearmans work suggests that some mental abilities are utilized across a wide range of tasks. If spatial
problem-solving tasks tap in those abilities, then our models can give us insights into peoples general
reasoning processes.
7/27/2019 Lovett Thesis Final
21/312
21
In fact, there is strong evidence that one of the tasks, Ravens Progressive Matrices (RPM), does tap
into general abilities. RPM was originally designed to evaluate one major component of g , eduction , or
the ability to identify meaningful patterns in confusing data (Raven, Raven, & Court, 1998). Since the
tests creation, several studies have shown it to be one of the highest single-test correlates with g (e.g.,
Burke & Bingham, 1969; Zagar, Arbit, & Friedland, 1980; see Raven, Raven, & Court, 2000b, for a
review), meaning an individuals performance on RPM predicts their performance on many other
intelligence tests. In a multi-dimensional scaling analysis of ability tests, Snow and colleagues (Snow,
Kyllonen, & Marshalek, 1984; Snow & Lohman, 1989) found that RPM lies in the middle of the ability
space. That is, while most ability tests cluster with other tests in the same domain, e.g., verbal,
mathematic, and spatial tests, RPM correlates highly with the most abstract ability tests from each
domain. This suggests that although RPM is a visuo-spatial task, it evaluates several domain-general
mental abilities. My hope, in modeling this task, is to gain a greater understanding of what those abilities
are.
1.2 Background
Two of the spatial problem-solving tasks, geometric analogy and the Ravens Progressive Matrices,
have been studied extensively in the past, while research on the oddity task is more limited. In the
following sections I summarize the research, focusing on two questions: 1) What are the processes people
use to perform the task? 2) What factors contribute to the relative difficulty of each problem?
Geometric Analogy
Processes
Geometric analogy (Figure 1.1, Figure 1.2A) has been studied and modeled by many researchers over
the last 50 years. Evans ANALOGY (1968) was the first computational model of the task. A
sophisticated program for its time, ANALOGY automatically encoded representations of the images in a
7/27/2019 Lovett Thesis Final
22/312
22
problem and could even identify spatial transformations between shapes, such as rotations. Once
representations had been constructed or hand-coded, it solved a problem via what has been termed an
infer-infer-compare strategy (Mulholland, Pellegrino, & Glaser, 1980):
1) Infer a mapping between images A and B, describing what changed between them.
2) For each possible answer n, infer a mapping between images C andn.
3) Compare the A/B mapping to the C/n mapping for each possible answer, and choose the answer for
which this comparison returns the closest match.
ANALOGYs mapp ing processes are unlike human mapping processes in that they perform an
exhaustive search for the best possible mapping, rather than using heuristics (Gentner, 1983). However,
the infer-infer-compare model remains a reasonable strategy for geometric analogy, and it is one we will
return to.
Sternberg (1977) argued that people actually solve problems like geometric analogy via aninfer-map-
apply strategy (Mulholland, Pellegrino, & Glaser, 1980):
1) Infer a mapping between images A and B, describing what changed between them.
2) Compute a mapping between images A and C, identifying their corresponding elements.
3) Based on this mapping, apply the A/B differences to C to compute D, a representation of the
image that would best complete the analogy. This D r epresentation can then be compared to each
possible answer to see which one best matches it.
Sternberg produced evidence that people utilized infer-map-apply on his problem set, while
Mulholland, Pellegrino, & Glaser (1980) argued that people utilized infer-infer-compare ontheir problem
set. Later, researchers suggested that people adjust their strategy depending on factors such as the
problems complexity and the relatedness of the items being compared (Bethell-Fox, Lohman, & Snow,
1984; see also Grudin, 1980, on verbal analogies).
These early researchers were focused on the particular strategy, or the set of cognitive components,
that a person would use to solve a geometric analogy. However, they were less clear on what specific
7/27/2019 Lovett Thesis Final
23/312
23
mechanisms might be used, for example, to compare images A and B. Only Evans (1968) had a complete
computational model. While computational models of geometric analogy have become more popular in
recent years (e.g., Bohan & ODonogh ue, 2000; Schwering et al., 2009), I believe no other researchers
have built an end-to-end computational model that could be compared against human performance.
Factors Contributing to Difficulty
The above researchers also analyzed the factors contributing toeach problems relative difficulty. In
generating their stimuli, Mulholland, Pellegrino, & Glaser (1980) independently varied the number of
elements in each image and the number of transformations between images, e.g., the number of
differences between images A and B. They found that as either number of elements or number of
transformations increased, the reaction time increased, while the accuracy decreased. Importantly, for
reaction time the effects of elements and transformations were not simply additive; there was a
particularly great cost when both number of elementsand number of transformations increased. The
researchers believed this was due to working memory load. As the number of elements or
transformations increased, the working memory load also increased. For particularly large numbers of
both, working memory load exceeded peoples capacity, forcing them to change their problem-solving
strategy, and resulting in substantial reaction time costs.
Bethell-Fox, Lohman, & Snow (1984) independently varied several factors in their geometric analogy
problems: number of elements in the images, number of transformations between images, figural vs.
spatial transformations, number of possible answers to choose from, and similarity of distractors to the
correct answer. They found that all these factors correlated with reaction times, while all factors except
number of transformations correlated with error rates. They also found some interesting interactions between the factors for example, those problems with the most imagesand the most possible answers
were significantly harder than other problems. Again, this may have been because the large memory load
required a strategy shift.
7/27/2019 Lovett Thesis Final
24/312
7/27/2019 Lovett Thesis Final
25/312
25
3) Distribution of three values: There are three different elements in the three images of the row.
Every row must contain each of those three elements, but their order varies.
4) Figure addition or subtraction: If the first image contains element X and the second image contains
element Y, the third image must contain elements Xand Y. Subtraction is similar.
5) Distribution of two values: There is an element that is present in two of the images but not the
third.
For example, to solve Problem 2B, one would study the images in the first row, note that the groups
of three squares correspond to each other, and recognize that the relation between these groups can best
be described by aquantitative pairwise progression rule: the squares are becoming smaller. One would
then select the answer that best fit this rule in the bottom row.
Carpenter, Just, and Shells (1990) first model, FAIRRAVEN, could perform APM about as well as
the average participant from their subject pool. The second model, BETTERAVEN, performed at the
level of the best participants. BETTERAVEN performed better due to a couple key changes: 1)
BETTERAVEN could identify cases where there was an element in only two of the three images in a
row, meaning it could use thedistribution of two values rule; 2) BETTERAVEN had better goal
management. It identified candidate rules one at a time, and it backtracked when a candidate rule proved
ineffective for solving a problem.
Based on the difference between the models, the experimenters suggested that the ability to manage
goals is a key factor in solving the hardest problems. They saw this as being linked to working memory: a
complex problem requires that one manage a hierarchy of goals, and keeping them all in memory can be
quite difficult.
In their analysis, the researchers noticed another interesting difference between high- and low-
performers on the task. In some cases, there was more than one rule that might be applied to solve a
problem. For example, suppose the three corresponding elements were an arrow pointed up, an arrow
pointed right, and an arrow pointed down. This set of elements could be seen as either a quantitative
7/27/2019 Lovett Thesis Final
26/312
26
pairwise progression, in which an arrow shape gradually rotates, or a distribution of three, with three
different shapes. While either of these rules is sufficient, the first rule is better, as it is more compact.
That is, one doesnt have to remember all three shapes individually. On problems such as these, verbal
protocols showed that higher performers used the quantitative pairwise progression rule more often than
lower performers.
Carpenter et al. suggested the above result occurred simply because higher performers were more
consistent about looking for the simpler rules before they looked for the more complex rules. However, I
think it is likely that all participants looked for the simpler rules first. I interpret these results as
suggesting, rather, that higher performers are better at identifying more abstract relationships between
elements. Whereas the lower performers saw the arrows as different shapes, the higher performers were
better able to compare the arrows, perform a spatial transformation, and identify a rotation between the
arrows shapes. Identifying more abstract relations on problems like these would reduce working
memory load and generally aid in solving the problems.
Carpenter et al. failed to address a couple important components of human processing with their
models. Firstly, the models did not generate representations from the images; all representations were
hand-coded based on participant descriptions. The experimenters suggested the visual encoding process
was irrelevant, given how well performance on the task correlates with other, non-visual tasks. However,
this assumes visual encoding ability does not generalize to encoding ability in other modalities. I believe
RPM taps into a general ability to encode a stimulus at the appropriate level of abstraction. Indeed, the
authors themselves suggest that an ability to decrease working memory load by using more abstraction
representations might explain the correlation between RPM and the Towers of Hanoi problem, a very
different task
Secondly, the FAIRAVEN and BETTERAVEN models did not learn the five rules described above.
Rather, abstract forms of all the rules were hard-coded into the models. Thus, the models fail to explain
7/27/2019 Lovett Thesis Final
27/312
27
how individuals begin with basic comparisons of pairs of images and develop, over time, an
understanding of a complex rule that holds across a row of images.
Factors Contributing to Difficulty
Two groups of researchers (Vodegel-Matzen, van der Molen, & Dudink, 1994; Embretson, 1998)
have evaluated Carpenter, Just, and Shells (1990) analysis by creating their own, experimental matrix
items in which they varied the number and type of Carpenter rules. Both groups found that a rule
complexity measure, based on the number and difficulty of rules, correlated highly with problem
difficulty: problems with more rules, and with more difficult rules, are harder to solve. This is likely
because such problems both put a greater load on working memory and require more sophisticated
problem-solving techniques. I believe a computational model which makes those problem-solving
techniques explicit is needed to gain a better grasp on what exactly makes more complex problems
harder.
Primi (2001) designed new matrix items in which he independently varied the number of elements in
each image, the number of rules describing a row of images, the complexity of rules, and perceptual
organization . Perceptual organization referred to how difficult it was to identify the corresponding
elements in a row of images. In hislow perceptual organization problems, the correspondence-finding
component was made difficult through misleading cues that encouraged test-takers to put the wrong
elements into correspondence. Perhaps unsurprisingly, he found that difficulty of correspondence-finding
was the greatest predictor of problem difficulty.
The official RPM problems, unlike the problems in Primis test se t, were not explicitly designed to
make correspondence-finding difficult. Nonetheless, there are some problems which invite inappropriatecorrespondences (Carpenter, Just, & Shell, 1990). In such cases, I suspect the most important skill is an
ability to backtrack and look for alternate mappings when a set of correspondences prove insufficient for
solving a problem. Thus, dealing with incorrect correspondences, like dealing with excessive memory
load, may require an ability to evaluate and dynamically modify problem-solving strategies.
7/27/2019 Lovett Thesis Final
28/312
28
Oddity Task
Processes
The oddity task (Figure 1.2B) is a set of 45 problems which Dehaene et al. (2006) constructed to
evaluate peoples geometric knowledge. The experimenters hypothesized that the problems tap intocore
knowledge of geometry (Spelke & Kinzler, 2007), an innate, universal cognitive module. This module
would presumably understand key geometric concepts like perpendicular lines, and would therefore be
used to solve problems like Figure 1.2B.
To test the universality of geometric knowledge, Dehaene et al. gave their 45 problems to participants
of varying ages from two cultural groups: North Americans and the Munduruk, a South American
indigenous group. They found that the Munduruk performed above chance on nearly all the problems,
despite their lack of formal schooling in geometry and mathematics. Furthermore, there was a significant
correlation between the American and Munduruk error patterns. The experimenters took this as
evidence that the two groups were working with a shared, universal set of core geometric knowledge.
As this work is relatively new, no other researchers have explored the processes people might utilize
in performing the task. However, I have suggested people likely use processes similar to those used in the
previous tasks (Lovett, & Forbus, 2011). Whereas geometric analogy involves comparing two images
and noticing their differences, the oddity task involves comparing images and noticing their
commonalities. Later, I will argue that both of these comparisons can best be modeled as structure-
mapping processes (Gentner, 1983).
Factors Contributing to Difficulty
The oddity task problems (Dehaene et al., 2006) were designed to evaluate understanding of two-
dimensional spatial concepts. As such, they vary greatly in the type and complexity of the spatial
concepts they test for. Several of the more difficult problems involve rotations or reflections between
7/27/2019 Lovett Thesis Final
29/312
29
shapes. Thus, as with geometric analogy, people may have more difficulty with problems that require
spatial visualization.
Importantly, these problems do not merely rely on ones abi lity to mentally rotate shapes. They also
require that one notice that there is a possible shape transformation, and then put in the effort to compute
the transformation. Thus, there may be a motivational element. Recall that on the RPM, higher-
performing participants were more likely to notice transformations like a rotation of an arrow shape, even
when the transformation wasnt strictly necessary for solv ing the problem. I believe that across all three
tasks, recognition of spatial transformations depends on: 1) spatial visualization ability, and 2) a general
intellectual interest in comparing elements and looking for more abstraction relationships between them.
1.3 Claims
My thesis rests on two broad sets of claims. The first set is about the representations and processes
we can use to model human spatial problem-solving. The second set is about what we can learn from
these models, in terms of the factors that contribute toa problems difficulty and the processes that
explain variation in human performance.
Modeling Spatial Problem-Solving
The studies described above tell us a little about the processes, and less about the representations, that
people utilize during spatial problem-solving. There are several open questions that must be resolved to
model the tasks from end to end. I propose the following hypotheses about human representations and
processes:
1) When possible, people utilizequalitative representations of the spatial relations between elements
in an image (e.g., Biederman, 1987; Kosslyn et al., 1989; Forbus, Nielsen, & Faltings, 1991) when
reasoning about space.
7/27/2019 Lovett Thesis Final
30/312
30
2) These qualitative representations are hierarchical (Palmer, 1977). That is, one can reason about
relations between objects in an image, or one can focus in and reason about relations between parts of an
individual object, or one can zoom out and reason about relations between groups of objects.
3) Qualitative, relational representations are compared via structure-mapping (Gentner, 1983, 1989), a
domain-general process of structural alignment. Structure-mapping can identify commonalities and
differences in two representations, or it can determine their similarity.
4) The computation of spatial transformations, such as rotations between shapes, is accomplished via
an interaction between hierarchical representations and structure-mapping. To compute a transformation
between two shapes, an individual uses structure-mapping over representations of the shapes parts to
identify corresponding parts. Then, a spatial transformation is applied to put the corresponding parts into
alignment.
5) Spatial problem-solving requires control processes which can look over a problem, apply a
strategy where a strategy consists primarily of a set of comparisons via structure-mapping and, when a
strategy fails, backtrack and attempt a new strategy.
In the following subsections, I briefly justify each hypothesis and then outline how it can be modeled.
1) Qualitative Representations of Spatial Relations
There is good evidence that people are sensitive to the qualitative, or categorical, relations between
objects in a visual scene. For example, parallel lines (Abravanel, 1977) and concave corners between
edges of a shape (Bhatt et al., 2006) are particularly salient to people, suggesting that we make a
qualitative distinction between parallel and non-parallel, or between concave and convex. In some cases,
people go out of their way to identify a qualitative relation between objects: when people are asked tomemorize the location of a dot in a circle, there is evidence they mentally divide the circle into four
quadrants and then qualitatively encode which of the four quadrants the dot was located in (Huttenlocher
et al., 1991).
7/27/2019 Lovett Thesis Final
31/312
7/27/2019 Lovett Thesis Final
32/312
7/27/2019 Lovett Thesis Final
33/312
33
correspondences between elements in the two cases; 2) a structural evaluation score , a measure of the
similarity of the cases based on the systematicity of the mapping; and 3)candidate inferences , inferences
about the target based on elements in the base that failed to map to the target. Candidate inferences can
be used to identify differences between the base and the target.
While structure-mapping was originally proposed to explain abstract analogies, there has been
mounting evidence that it can also explain peoples concrete comparisons of visual sti muli (Markman &
Gentner, 1996; Lovett et al., 2009a; Sagi, Gentner, & Lovett, in press). I believe it may play a ubiquitous
role in spatial problem-solving because of its usefulness at various stages in the problem-solving process.
For example, the infer-infer-compare strategy for geometric analogy, described above, relies on both
computing the differences between images and comparing two sets of differences. SME can do both
these things, as the sets of differences computed by SME can themselves be compared via SME in a
second-order comparison (Lovett et al., 2009b).
4) Structure-Mapping in Spatial Transformations
One of my goals is to better understand how people perform spatial visualization, particularly how
they compute transformations between shapes. Work onmental rotation (Shepard & Metzler, 1971;
Shepard & Cooper, 1982), has shown that the time required to determine that one shape is a rotation of
another is proportional to the degrees of rotation between them. This suggests that people perform this
task by mentally rotating their representation of one shape to align it with the other. If so, people cannot
be working solely with a qualitative shape representation, as described above. Qualitative representations
would not include the absolute orientations of shape parts, so there would be no need to transform them in
an analog fashion.I believe mental rotation is performed on a quantitative, orientation-specific shape representation.
While the exact form of this representation is unclear, it can be modeled simply as the set of edges in a
shape, along with each edges quantitative orientation , length, and location. However, a mystery remains.
Typically, the time to perform a mental rotation is proportional to the degrees of rotation along the
7/27/2019 Lovett Thesis Final
34/312
34
shortest possible rotation between the two shapes. How do people know which way to rotate one shape to
quickly align it with the other?
To answer this question, I have proposed a three-stage model of mental rotation:
1) Compare qualitative edge-level representations via structure-mapping. Identify the corresponding
edges.
2) Take a single pair of corresponding edges and calculate the shortest possible rotation between
them.
3) Apply this rotation to the full set of quantitative edges in the first shape. Evaluate whether this
aligns those edges with the edges in the second shape.
This approach can identify other transformations, such as reflections, as well. I describe the approach
in detail in Chapter 3.
5) Control Processes
I have argued that structure-mapping can account for much of the processing in spatial problem-
solving. However, there must also be a control process which chooses a strategy, oversees the application
of that strategy, and selects a new strategy if the original fails to produce a solution. New strategies may
be necessary when a particularly complex problem overloads the working memory (Mulholland,
Pellegrino, & Glaser, 1980; Bethell-Fox, Lohman, & Snow, 1984), or when a tricky problem causes one
to identify the wrong corresponding elements (Primi, 2001).
7/27/2019 Lovett Thesis Final
35/312
35
A) B)
Average RT: 6.7 s Average RT: 26.7 s
Accuracy: 100% Accuracy: 56%
Figure 1.4 Two related geometric analogy problems, along with average reaction times and
overall accuracies for human participants.
For example, consider the geometric analogy problems in Figure 1.4. While these problems are
nearly identical, the second one is significantly harder (Lovett et al., 2009b) because the obvious mapping
between images A and B, in which the large triangle in A corresponds to the large triangle in B, is
incorrect. Thus, the problem requires backtracking and identifying an alternate, less obvious mapping, in
which the small triangle in A corresponds to the big triangle in B. Clearly, this type of backtracking is
difficult for people.I have built control processes directly into Spatial Routines for Sketches. A spatial routine can
include loops that iterate through multiple strategies until a sufficient solution is found a sufficient
solution may be one whose SME mapping receives a reasonably high score, or one whose SME
comparison contains no differences. Spatial routines can also contain nested loops, allowing multiple
combinations of strategies to be attempted. Thus, a spatial routine is much like a computer program
(Ullman, 1987). Note that I do not model working memory capacity. Because there are nolimits to a models working
memory, it will never fail to solve a problem because too many things must be kept in memory.
Similarly, it will never forget the previously completed steps of a problem and need to rework them. I
7/27/2019 Lovett Thesis Final
36/312
36
view the modeling of working memory capacity as future work which can be attempted when spatial
working memory is better understood.
Using the Models to Study Human Performance
Once models of spatial problem-solving have been developed, we can use them to study human
performance on the tasks, identifying the factors that make one problem easier or harder than another.
While several other researchers have considered this issue, an end-to-end task model allows us to be
much more explicit about how a particular factor puts more loadon the models representations and
processes. The models can also be used to automatically tag each problem with information such as the
number of elements that must be represented, the number of operations that must be applied to solve the
problem, etc. Thus, a model can code a problem more objectively than if we simply hand-code each
problem stimulus based on our best guesses about what the problem requires.
Once problems are coded for different factors, we can build mathematical models of performance
matching particular individuals, or groups, in order to determine how individuals vary in their ability to
handle different sources of difficulty. In this way, we can hopefully develop a better understanding of
commonalities and differences in peoples spatial reasoning ability.
In my analysis, I focus on three broad classes of factors: encoding & abstraction, working memory
load, and control processes. I describe each of these next.
Encoding & Abstraction
Encoding & abstraction refers to the ability to encode representations at the appropriate level of
abstraction for solving a problem. Some problems may be more difficult because they require
representations that people are less inclined or less able to compute. I am interested in two forms of
abstraction, which I callentity abstraction and relational abstraction .
Problems that are high on entity abstraction require the solver to encode elements at a particular level
in the representational hierarchy. This might be either a larger scale or a smaller scale than the most
7/27/2019 Lovett Thesis Final
37/312
37
comfortable level for an individual. To demonstrate, let us consider a simple form of analogical problem-
solving: the letter string analogy (Mitchell, 1993; Hofstadter, 1995). The simplest letter string analogy is:
abc : abd :: rst : ?
Here, the obvious solution is rsu, based on incrementing the final letter by one. Suppose instead we
have:
abc : abd :: rrsstt : ?
While an obvious solution here would be rrsstu, a better one might be rrssuu. Discovering this
solution depends on entity abstraction. One must recognize that the appropriate level of abstraction for
representing the rrsstt letter string is as two-letter groups. The entire tt group in rrsstt maps to the
letter c in abc.
On the other hand, problems that are high onrelational abstraction require the solver to notice more
complex relations between elements. Often, this involves comparing elements within a single image (or
letter string) to recognize commonalities and differences among them, before one compares across images
(or letter strings). For example, consider the problem:
abc : abd :: hkkuuu : ?
Here, an obvious solution might be to apply the letter successor relation and get hkkuuv or
hkkvvv. However, if one spends more t ime considering the relationships between the elements in
hkkuuu, one should recognize that this letter string contains a new kind of successor relationship: group
length. One might then choose to increment this successor relationship instead, to get hkkuuuu.
Of course, entity and relational abstraction occur in slightly different forms in spatial problem-
solving. Entity abstraction is the ability to build a representation at the level of groups of objects, objects,
or parts of an object, as the problem demands. Relational abstraction is the ability to recognize complex
relations, particularly spatial transformations such as rotations or reflections. I use the letter string
examples to illustrate that these abilities are not specific to the domain of spatial problem-solving. Entity
7/27/2019 Lovett Thesis Final
38/312
38
and relational abstraction may rely on a general inclination to try out different abstractions for
representing stimuli.
Unfortunately, my analysis is insufficient for distinguishing between theinclination and theability to
form abstractions. Thus, relational abstraction refers to both an interest in comparison and abstraction,
and an ability to compute spatial transformations between shapes, i.e., spatial visualization ability.
The simplest way to determine whether a certain problem requires entity or relational abstraction is
via ablation. That is, one can temporarily remove the models ability to generate representations at a
particular level, or to compute spatial transformations, and check whether the model now fails on the
problem.
Working Memory Load
Working memory load is simply the number of things that must be kept in mind when solving a
problem. Several studies have suggested that people are slower and less accurate to solve problems that
involve more elements or more transformations between elements (e.g., Mulholland, Pellegrino, &
Glaser, 1980; Bethell-Fox, Lohman, & Snow, 1984; Embretson, 1998; Vodegel-Matzen, van der Molen,
& Dudink, 1994). One advantage of computational models is that they can automatically code the
number of elements, number of relations, number of differences between representations, etc, that are
used to solve a problem. Thus, we can easily code a problem for working memory load. We can evaluate
to what extent these different factors (number of elements, number of relations, etc) place a load on
working memory by considering which factors correlate with problem difficulty.
Control Processes
The articles cited above argue that more complex problems are harder only because of increased
working memory load. However, an alternate explanation is that, in some cases, more complex problems
require more careful control over problem-solving strategies. For example, in Ravens Progressive
Matrices, as the number of elements increases, the problem of identifying the corresponding elements
7/27/2019 Lovett Thesis Final
39/312
39
becomes more difficult. Thus, one is more likely to identify incorrect corresponding elements on the first
try, fail to solve the problem, and require backtracking.
Carpenter, Just, and Shell (1990) have referred to this problem as goal management and have
suggested that goal management itself boils down to working memory, since more complex problems
require keeping a larger hierarchy of goals in memory. However, I believe we should consider control
processes as a separate factor, particularly as the computational models allow us to directly code the
number of operations and the amount of backtracking required to solve a problem. Thus, we can look at
whether this factor explains any of the variance in problem difficulty, beyond what working memory load
explains.
1.4 Contributions
My thesis consists of five contributions. Firstly, the Perceptual Sketchpad is an extension to the
CogSketch sketch understanding system. Given a set of objects drawn in CogSketch, the Perceptual
Sketchpad automatically segments each object into its edges and groups objects together based on
similarity. It generates human-like qualitative representations at three hierarchical levels: groups, objects,
and edges.
Secondly, Spatial Routines for Sketches (SRS)is a modeling framework inspired by Ullmans (1987)
visual routines . As with visual routines, SRS utilizes a set of cognitive operations that we know people
can accomplish. These include perceptual encoding and comparison via SME. The operations can be
combined to create a spatial routine for performing some spatial task. For example, a spatial routine
could describe one strategy for solving geometric analogy problems. A spatial routine is analogous to a
computer program: each operation takes an input, processes it, and produces an output.
SRS allows researchers to construct cognitive models of human performance on a task. Once a
researcher has written a routine, SRS implements its operations, allowing the routine to be tested on
7/27/2019 Lovett Thesis Final
40/312
7/27/2019 Lovett Thesis Final
41/312
41
for human accuracy and reaction times. In this way, the models can help us better understand what makes
a problem easy or difficult for people.
Finally, for the oddity task only, I have built separate linear models for the different cultural and age
groups. These models highlight some commonalities and differences in the representations and processes
people use to reason about space.
1.6 Outline of the Thesis
In the following two chapters, I consider the key problems of visual encoding and comparison. I
begin by outlining the psychological evidence for hierarchical hybrid representations , containing both
qualitative and quantitative features at multiple levels of abstraction. I show how even basic visual
comparisons require a strategic search through this hierarchy. I then describe my implementations of
perceptual encoding, image comparison, and shape comparison.
In Chapter 4, I present Spatial Routines for Sketches (SRS), a general framework for modeling spatial
problem-solving. I enumerate its key operations, including perceptual encoding, visual comparison, and
visual inference.
In Chapter 5, I give an introduction to cognitive modeling with SRS, discussing the general principles
that go into all my task models.
In Chapters 6-8, I describe the task models for geometric analogy, Ravens Progressive Matrices, and
the oddity task. I present results comparing each model against human performance.
In Chapter 9, I discuss predictions that follow from my models. Several hypotheses and assumptions
went into the production of each model, and these can be explicitly tested in future psychological studies.
Finally, in Chapter 10, I draw conclusions, discuss related work, and consider directions for future
research.
7/27/2019 Lovett Thesis Final
42/312
42
2. Background
Spatial problem-solving depends on the ability to compare images and identify similarities and
differences. This is difficult because even simple images contain many features, of which only a few are
critical to a given comparison. For example, Figure 2.1A displays an oddity task problem, in which one
must choose the image that doesnt belong. Each image contains four circles. The circles possess various
features (e.g., roundness), and there are several spatial relations between the circles. On the other hand,
each image also contains a row of dots. In five of the images, there is only a single row, whereas the last
image contains a row plus an additional outlying dot. Thus, it is much easier to pick the odd image out if
we view them as rows then if we view them as individual circles. However, we have no way of knowing
a priori what features will be important when we consider a problem.
A) B)
Figure 2.1 Two oddity task problem from Dehaene et al. (2006). Pick the image that doesnt
belong.
To solve these problems, one must flexibly move between candidate representations until a
comparisons critical features are discovered. Here, I argue that human perceptual systems produce
hierarchical hybrid representations (HHRs) of space. These representations are hierarchical in that a
given stimulus can be encoded at several levels of abstraction. They are hybrid in that, at each level, there
are at least two types of representations: qualitative and quantitative. For example, consider the edges of
the objects in Figure 2.1B. These edges possess several quantitative features, such as length and
7/27/2019 Lovett Thesis Final
43/312
43
orientation, that are irrelevant to the problem. However, if we focus on thequalitative relations between
edges, we see that in all images except one, there are two pairs of parallel edges.
Psychological evidence suggests hierarchical representations are explored top-down (Hochstein &
Ahissar, 2002). That is, an individual will first compare two stimuli at a high level of abstraction and
then move down as needed. In contrast, the temporal interaction between qualitative and quantitative
representations is less clear (Bornstein & Korda, 1984; Kosslyn et al., 1977). However, in both cases,
individuals may strategically reprioritize representations, depending on the task demands.
Top-down comparison of hierarchical representations may depend on identifying corresponding
elements in structural descriptions. I will argue that structure-mapping theory (Gentner, 1983, 1989)
provides a parsimonious explanation of how people align representations during visual comparison.
I begin by presenting psychological evidence for hybrid and for hierarchical representations. I then
discuss structure-mapping as a mechanism for visual comparison over HHRs. Afterwards, I apply HHRs
to mental rotation, a commonly studied comparison task. Finally, I consider how individuals strategically
search through HHRs during comparison.
2.1 Hybrid RepresentationsEvidence for hybrid representations comes from at least three bodies of research: categorical
perception, object recognition, and visual priming. I consider each of these in turn. Many of the
experiments below use the same/different paradigm. In visual same/different experiments (Farell, 1985),
participants are shown two images, either sequentially or simultaneously, and asked whether they are the
same or different. By varying the type and degree of differences, researchers can study the form of visual
representations. Furthermore, by varying the exposure time and the interval between stimuli, researchers
can potentially isolate the contributions of encoding, memory, and comparison processes. As we shall
see, same/different experiments provide evidence for at least two distinct representations: qualitative and
quantitative.
7/27/2019 Lovett Thesis Final
44/312
44
Categorical Perception
Categorical perception is our ability to use qualitative categories when distinguishing between
quantitative values. In categorical perception experiments, participants compare stimuli that typicallyvary along a single dimension (e.g., color: Bornstein & Korda, 1984; size: Kosslyn et al., 1977; or
location: Maki, 1982). A common finding is that the more distant the stimuli are along that dimension,
the more quickly an individual can recognize that they are different, or that one is greater than the other.
However, when the stimuli have different categorical values, performance improves significantly.
For example, Bornstein and Korda (1984) showed participants color patches in a sequential
same/different task. They used seven patches that varied from blue to green, such that the middle patch(4) was ambiguous in color. Thus, they could independently vary the quantitative difference and the
qualitative difference (same-color vs. different-color) between patches. They found that for pairs with a
distance of two, participants responded different faster when the patches were different colors than
when they were the same color, suggesting that participants used the color labels when comparing.
However, participants could also distinguish between different patches with the same label, indicating
they had access to the quantitative hue value. Bornstein and Korda theorized that participants werecomparing the quantitative and qualitative color values in parallel, and responding as soon as either
comparison returned a difference.
Categorical perception has been studied extensively in the color domain (e.g., Bornstein & Korda,
1984; Roberson, Pak, & Hanley, 2008; Regier & Kay, 2009). However, it has also been found in other,
more spatial domains. Kosslyn et al. (1977) taught participants to draw 6 stickmen of varying sizes and
also trained them to characterize the three largest as large and the three smallest as small. Similarly,
Maki (1982) taught participants the locations of twelve cities on a map, in which six cities were in a
northern state and six were in a southern state. Afterwards, participants were given the names of stimulus
pairs and asked to judge which was larger or farther north. Both researchers found that participants
7/27/2019 Lovett Thesis Final
45/312
45
responded faster when the stimuli were farther apart along the relevant dimension. However, participants
also responded faster to stimuli on opposite sides of the category boundary they had learned.
Categorical perception has also been found for spatial relations between an objects parts. Rosielle
and Cooper (2001) used line-drawings of 3-D objects in a sequential same/different study. Each object
had two parts connected by an arm. The arms orientation varied along four values: 0 , 30, 60, and 90.
For the first value (0), participants might recognize a qualitative parallel relationship between the
parts. For the last value (90), participants might recognize a perpendicular relationship. However,
either of the middle two values would likely be labeled as oblique. Rosielle and Cooper compared
performance across the conditions where the two objects had identical parts but their angles differed by
30. They found that participants responded slowest (and error rates were highest) when the object
changed from one oblique angle to the other. Participants performed better when the object changed to or
from a parallel or perpendicular angle, suggesting they were encoding and using these qualitative spatial
relations.
Finally, Ferguson, Aminoff, and Gentner (1996) showed that categorical perception can play a role
even when perceiving a single object. They showed participants 2-D polygons and asked them to
evaluate whether each polygon possessed an axis of symmetry. This is essentially a same-different task
in which the participant is comparing one half of a shape to the other. The experimenters looked at the
time and accuracy to detect purely quantitative asymmetries vs. qualitative asymmetries, where the latter
included changes in the number of vertices and changes between concave and convex vertices. The
experimenters attempted to keep the degree of quantitative asymmetry constant while varying the
presence of qualitative asymmetries. They found that participants were better able to detect qualitative
asymmetries, suggesting again that participants use qualitative spatial relations during comparison.
These studies indicate that participants consider both quantitative and qualitative features during
visual comparison. Features may be purely visual (i.e., color) or spatial. Spatial features may include
attributes of individual objects or relations between elements (i.e., relative orientation, convex vs.
7/27/2019 Lovett Thesis Final
46/312
46
concave). It is less clear whether individuals prefer one type or the other. Some researchers (Bornstein &
Korda, 1984; Kosslyn et al., 1977) have suggested that qualitative and quantitative features are compared
in parallel, so that a large difference along either dimension produces a fast response. We shall return to
this question later.
Object Recognition
A heated debate in the field of object recognition provides further evidence. Here, the question is
how people rapidly recognize objects from different viewpoints. Biederman and colleagues (Biederman,
1987; Hummel & Biederman, 1992) have proposed that people represent an object as a set of shape
primitives, called geons , with qualitative spatial relations between the geons. Geons and relations are
identified based onnon-accidental properties , properties of the image that are unlikely to be tied to a
specific viewpoint. For example, if two edges along a shapes surface appear parallel from one
viewpoint, they will also appear parallel from most other viewpoints. Thus, a given object should
produce a similar geon representation from many different viewpoints, facilitating the recognition
process.
A set of geons, features of the geons, and spatial relations between geons, make a geon structural
description (GSD). This approach aligns with the qualitative representations described above in that
GSDs capture qualitative features of, and relations between, the parts of an object. It also provides some
pointers to what qualitative features might be encoded those features which are unlikely to have
occurred accidentally, due to viewing a scene from a particular viewpoint.
Biederman and colleagues predictthat any rotation which does not alter an objects GSD should not
interfere with ones ability to recognize it. They have shown (Biederman & Gerhardstein, 1993) that
participants can quickly recognize two objects as the same, regardless of rotation in depth, provided they
generate the same GSD. Similarly (Biederman & Bar, 1999), participants can quickly recognize that two
objects are different, regardless of rotation in depth, when they generate different GSDs, i.e., they differ in
7/27/2019 Lovett Thesis Final
47/312
47
a non-accidental property. However, when two objects differ only in a quantitative property, recognizing
they are different over rotations is difficult. This fits well with the above finding (Rosielle & Cooper,
2001) that participants can better tell two objects are different when the change in their angles is
qualitative, rather than quantitative.
However, these results have not gone unchallenged. Tarr and colleagues (Tarr et al., 1997; Tarr et al.,
1998; Hayward & Tarr, 2000) have argued that we use viewpoint-dependent representations to recognize
objects. They claim that we deal with multiple viewpoints by learning several different representations
for a given object, and by mentally transforming between a novel view and the closest known view.
Thus, they predict that when an object is unfamiliar, there should be a cost for recognizing it from a novel
viewpoint. In support of this, their studies have shown a cost for recognizing objects from new
viewpoints, even when the GSDs are mostly identical. These studies used the same paradigms (Tarr et
al., 1997) and sometimes even the same stimuli (Tarr et al., 1998) as Biederman and colleagues. Thus,
apparently very subtle differences can shift participants between representations that are more or less
viewpoint-specific.
Biederman and Bar (1999) suggest one factor is the accessibility of the geons in a 3-D shape. Even
differences to a single vertex can make it harder to recognize a rotated version of an object (Biederman &
Bar, 1998). Of course, this means the viewpoint-invariant representation is quite brittle. If it is so easily
disrupted, then one cannot expect two viewpoints of the same object to produce identical representations
consistently. On the other hand, viewpoint-invariant recognition is achievable in some situations
(Biederman & Gerhardstein, 1993). Thus, the results suggest: a) we can access a viewpoint-invariant
representation, such as GSD; b) this representation can be used to quickly compare two object images; c)
when this representation is insufficient, we must fall back on viewpoint-specific information to perform
the comparison. Here, again, we see a hybrid perceptual representation. One component is qualitative
and primarily viewpoint-invariant. The other contains quantitative and viewpoint-specific information.
7/27/2019 Lovett Thesis Final
48/312
48
Visual Priming
Further support for hybrid representations comes from visual priming studies. Hummel and
colleagues (Stankiewicz, Hummel, & Cooper, 1998; Stankiewicz & Hummel, 2002; Thoma, Hummel, &Davidoff, 2004) have argued that a viewpoint-invariant representation such as GSD can only be generated
when one attends to an image. Attention allows one to identify the parts of an object and bind them to
features and relations (Treisman & Gelade, 1980), thus generating a structural description. If an
individual is exposed to an image but fails to attend to it, they will generate a more holistic, viewpoint-
specific representation.
In their experiments, two objects are displayed on the screen together. Participants are cued to attendto one, while they are given very little time to notice the other typically, the two objects together are
displayed for only 120 ms. Afterwards, participants are shown another object and asked to name it. The
experimenters predict that if an individual attends to an object, they will generate a viewpoint-invariant
representation that will prime them for naming that object again, even if the object is transformed.
However, if an individual doesnt attend to an object, they will generate only a viewpoint -specific
representation. This representation will support priming across a much narrower set of transformations.Hummel and colleagues found that when participants attended to an object, they were primed to
recognize it again, even if the original object was mirror-reflected (Stankiewicz, Hummel, & Cooper,
1998), or scrambled (Thoma, Hummel, & Davidoff, 2004) by splitting the image into two halves and
inverting them. Importantly, the priming for both mirror-reflected and scrambled images was greater than
the priming for seeing a different object from the same class (e.g., two different pianos), so this was not
merely semantic priming.In contrast, an ignored image failed to prime mirror-reflected or unscrambled versions of itself,
although it did prime an identical version, as well as larger or smaller versions (Stankiewicz & Hummel,
2002). Curiously, a scrambled, ignored image failed to prime even an identical scrambled image (Thoma,
Hummel, & Davidoff, 2004). This, along with the size invariance, suggests that the ignored image does
7/27/2019 Lovett Thesis Final
49/312
49
not simply produce a pixel-by-pixel representation. Some amount of abstraction or recognition appears to
occur even for ignored images, although it is insufficient for supporting transformations such as mirror
reflection.
Characterizing Qualitative Representations
We have now seen strong evidence for a qualitative/quantitative distinction in visual representations,
as well as reasonably strong evidence for a viewpoint-invariant/viewpoint-specific distinction. However,
two questions remain: 1) Are the qualitative and the quantitative simply two sets of features in a single
representation, or are there real differences in the ways they are represented and reasoned over? 2) How
closely tied together are the qualitative representation and the viewpoint-invariant representation? To
better answer these questions, I will now make some claims about the nature of qualitative
representations.
1) Qualitative representations appear linked to symbolic processing in the brain
Qualitative features (e.g., blue, above, or parallel) often correspond to single words, whereas
quantitative features (e.g., 3 inches to the right) often do not. Therefore, if qualitative and quantitative
information are represented separately in the brain, one might suppose that the left hemisphere, known for
language-processing (at least in right-handed ind