The induction of solution rules in Raven's Progressive ...ppw.kuleuven.be/okp/_pdf/Verguts2002TIOSR.pdf · Raven’sAdvancedProgressiveMatricestest(RPM;Raven,1962).Carpenter,Just,

The induction of solution rules in Raven’s ProgressiveMatrices Test

Tom Verguts and Paul De BoeckK.U. Leuven, Belgium

In this paper, we study the rule induction process in a popular intelligence test,Raven’s Advanced Progressive Matrices test (RPM; Raven, 1962). Carpenter, Just,and Shell (1990) have shown that only a few rule types are necessary to describeall items of the test. An independent question that has not been investigated iswhether participants profit from this, that is, do they learn to apply these rules morefluently throughout the test? We show that this is the case and look in detail atdifferent aspects of this learning process. The relevance of our findings is dis-cussed for a process theory on solving RPM items.

The Raven’s Advanced Progressive Matrices (RPM) test is an intelligence testwidely used both in applied and research settings (e.g., Arthur & Day, 1994;Jensen, 1987, respectively). The test correlates with many (other) indices ofintellectual functioning (Marshalek, Lohman, & Snow, 1983). Therefore, it isoften treated as a measure of g, or general intelligence. For example, in thecognitive correlates tradition, some researchers have investigated the role ofprocessing speed in elementary cognitive tasks (ECTs) on intelligence by cor-relating the reaction time on these ECTs with RPM performance (e.g., Carlson,Jensen, & Widaman, 1983; Jensen, 1987). Other researchers have addedworking memory measures to this battery and compared correlations between,on the one hand, ECT performance and working memory measures, and, on theother hand, RPM performance (e.g., Fry & Hale, 1996; Salthouse, 1991). Stillothers have investigated the role of nerve conduction velocity (NCV) on intel-ligence by correlating NCV with RPM scores (Reed & Jensen, 1992).

Still, despite its presence in many studies directly or indirectly related tointelligence, few attempts have been made to study the RPM solution process ingreat detail. Such an account would be useful, however, since it could shed a

EUROPEAN JOURNAL OF COGNITIVE PSYCHOLOGY, 2002, 14 (4), 521–547

Requests for reprints should be addressed to Tom Verguts who is now at Department of Psychology,Ghent University, H. Dunantlaan 2, 9000 Ghent, Belgium. Email: [email protected]

We wish to thank Frank Rijmen, Gilbert Vander Steene, and Johan Wagemans for their usefulcomments.

# 2002 Psychology Press Ltdhttp://www.tandf.co.uk/journals/pp/09541446.html DOI:10.1080/09541440143000230

light on why and under what circumstances one should expect the RPM test tocorrelate with these other measures.

Two exceptions are the theories provided by Carpenter, Just, and Shell (1990)and Embretson (1995). We will here focus on Carpenter et al.’s work. Carpenter etal.’s task analysis of the RPM test reveals that a few (specifically, five) rules canbe used to solve all items of the test. Starting from that idea, the authors propose acomputer simulation model that can find the relevant rules in an item by trying toapply these different rules subsequently. Since many rules are often required inorder to solve one particular item, sub-results must be stored during the problem-solving process also. The authors propose two factors of individual differences inRPM solving ability: first, the number of rules a person has available; and second,the number of sub-results that can be stored simultaneously by a person.

An independent question that arises naturally from Carpenter et al.’s (1990)study is whether participants also actually use these same rules repeatedly whilesolving the RPM test. Furthermore, given that this is the case, it might behypothesised that these rules become activated during test taking and thereforepeople become biased toward using these particular rules. Such an effect wouldbe in line with the Einstellung , or mental set effect that has been found in thewater jars task (e.g., Dover & Shore, 1991; Luchins & Luchins, 1954) andsimilar tasks (Lovett & Anderson, 1996). Following Luchins and Luchins’study, the influence of a problem solving set on current problem solving hasbeen extended to a variety of tasks, comprising anagrams (Kaplan & Schoenfeld,1966; Lemay, 1972; White, 1988), insight problems (Duncker, 1945; Maier,1931) and baseball knowledge (Wiley, 1998).

The present paper is an attempt to investigate the mental set effect in RPM-like items. As noted in the previous paragraph, this is not the first paper to reporta mental set effect in cognitive tasks. However, what is new is that (1) the effectis investigated with and obtained with RPM (or RPM-like) items, and (2) aformal model is described that allows testing different mental-set aspects in thedata, such as the existence of a lag effect (i.e., differential influence of earlierversus more recent items on current item solving).

Given that the mental set effect has been reported before, it might seemobvious to expect one in the RPM data as well. However, almost all research inthe (individual difference) literature concerned with the RPM assumes, usuallyimplicitly, that no learning effects occur. For example, different factor analyseshave been performed on the RPM test to discover its underlying dimensionality(e.g., Alderton & Larson, 1990). Such an analysis assumes that one or moreconstant person abilities and item difficulties are involved during test taking.The presence of learning in the RPM would violate these assumptions.

Although this mental set effect may sometimes hinder in finding the correctrule, we think a mental set can often actually be beneficial for the followingreason. The rules of the first items of the test are usually very easy to find. It isbelieved that these rules are then activated in the mind of the testee. Progressing

522 VERGUTS AND DE BOECK

through the test, the items become more and more difficult,1 and it is no longerthe case that the correct rule is immediately elicited. However, this situationonly occurs after some solution rules have already been activated, namely, thoseof the earlier items. We assume that the testee will try out the rules she has triedbefore, and the probability (or insistence) of trying out a certain rule depends onthe activation of that rule, that is, on its occurrence in previous trials. Thispositive outlook on the mental set effect is in line with the sequence effect(Sweller & Gee, 1978). This is the effect that if problems are ordered from easyto hard, the problems are much easier to solve than when they are ordered fromhard to easy. More generally, Ross (1984, 1987) has given a similar account oflearning a cognitive skill, in which he emphasises the role of remindings ofearlier problem-solving episodes on current problem solving.

Lovett and Anderson (1996), in the context of their building sticks task,emphasize that both past and present situation aspects determine the probabilitythat a particular rule is chosen. The present paper concentrates on the firstaspect, namely, the influence from past experience on current problem solving.More specifically, we will study the influence of previous items (in the sametest) on the current item. The second aspect, the influence from the presentsituation aspects (i.e., stimulus features in the presented item), will be discussedshortly in Experiment 2, but it is not in the focus of our empirical attention.

The remainder of this paper is built up as follows. First, we explain thestructure of typical RPM items in some detail. Second, we develop a formalmodel that will allow us to investigate different aspects of the solution process.Third is a description of the first application of the theory in Experiment 1.Finally, we describe a second application in Experiment 2, which is followed bya General Discussion.

RPM ITEMS

An example of an RPM item is given in Figure 1. (For copyright reasons, theitem is invented, but it is similar to a real RPM item.) The goal is to pick one ofthe eight response alternatives in the lower part that according to a logical rulecompletes the 3 6 3 matrix of elements in the upper part. There are 36 suchitems. Usually, a time limit of 40 minutes is imposed on the total test.

In principle, an item could be solved rowwise, columnwise, or by a combi-nation of the two. However, we assume here and in the following that each itemis solved rowwise. Also, participants in our Experiments are always instructed tosolve items rowwise. The correct answer alternative is 3 in this case; it is theaddition of the black parts of the first and second elements in the third row of the

1 As an illustration, in a previous study of ours concerned with the RPM (Verguts, De Boeck, &Maris, 1999) the full RPM test was administered with a time limit of 40 minutes (which is usual).The first 10 items had an average probability of success of .957, so they are quite easy as claimed.The last 10 items had a probability of success of only .442. The total test consists of 36 items.

INDUCTION OF SOLUTION RULES 523

matrix. The relevant (logical) rule in this item can therefore be stated as ‘‘addthe parts of the first and second element to obtain the third one’’.

A second, and slightly more difficult item is shown in Figure 2. This itemobeys the following rule: In each row, the fork is making a rotational movement.If this rule is applied to the third row, it is clear that answer alternative 3 is thecorrect one. In this case, the relevant rule may be stated as ‘‘a figure is rotatingover the columns’’.

Figure 1. Example of an easy RPM item.


Yet a third and more difficult item is given in Figure 3. This itemresembles the last, and most difficult item of the RPM test. The item can besolved as follows. For the outer lines (i.e., the diagonal lines), the lines thatelement 1 and 2 (within a row) have in common also appear in element 3.For the inner lines (i.e., the ones attached to the central dot), only the linesthat are unique in element 1 or 2 do appear in element 3. Applying this ruleto the third row, one sees that alternative 2 is the correct response. This third

Figure 2. Second example of an easy RPM item.


item will serve as the basis of all items in our first Experiment, as will beexplained later in the paper.

A MODEL FOR MENTAL SET

Denote by Ypi the variable that indicates for item i which rule was chosen byparticipant p. Further, let the value xik indicate whether item i is of rule type kand this information was available for the participant after having responded.

Figure 3. Example of a difficult RPM item.


Hence, if both item i was of type k and this information was available, then xik =1; otherwise, xik = 0. The relevant information (i.e., that item i was of type k)can, for example, become available by the participant finding the appropriaterule k. Another way this information can become available is when the correctrule is told by someone else (e.g., the experimenter). We then make the fol-lowing assumption on the probability that rule k is chosen

Pr…Ypi ˆ k† êxp lk ‡ bp

Pi¡1

jˆ1xjk

" #

PK

mˆ1exp lm ‡ bp

Pi¡1

jˆ1xjm

" # …1†

if K rules are involved. The parameter lk denotes the initial strength of rule k.The parameter bp is a (person dependent) learning parameter and scales theeffect of previous usage of rule k (factor

Pi¡1jˆ1 xjk) on the current probability

of using rule k. Note that one expects b to be non-negative. If b equals zero,then no learning occurs. Each person is assigned his/her own learning para-meter bp. The sizes of estimated bs will inform us about the extent of learn-ing in the data and individual differences in learning rate. The numerator maybe regarded as the activation value of rule k, for person p, upon presentationof item i. In the denominator, the summation is taken over all such rule acti-vation values. Hence, the probability that rule k is chosen depends on theactivation of that rule relative to all other rules. It can be noted that the pre-sent model (1) is similar (but not identical) to the dynamic Rasch model pro-posed by Verhelst and Glas (1993).

To investigate a lag effect (i.e., a differential influence of items that wereshown recently versus some time ago), the model can be extended as follows:

Pr…Ypi ˆ k† êxp lk ‡ bp

Pi¡1

jˆ1xjk…i ¡ j†g

" #

PK

mˆ1exp lm ‡ bp

Pi¡1

jˆ1xjm…i ¡ j†g

" # …2†

In this extended model, if the parameter g is smaller than zero, then a recencyeffect appears in that recent items have more influence than earlier items (notethat i > j and so i – j > 0). If the parameter g is larger than zero, a primacy effectoccurs because in that case, earlier items have a stronger influence than recentlypresented items. If the parameter g equals zero, then model (2) reduces to model(1). There is only one g parameter for all persons. All parameters are unrest-ricted, that is, they can in principle take any value between minus and plusinfinity.


In reality, a solution is often found at the end of a string of (preliminary)responses, i.e., a sequence of rules (or temporary hypotheses) that are tem-porarily accepted, tried out, and possibly rejected. With equations (1) and (2),we will model only the very first response on every item, so that every itemeffectively consists of one trial only (see Lovett & Anderson, 1996). Theseresponses are obtained by examining the think-aloud protocol of each participant(see Procedure for the first experiment, later). The reason is that studying theother responses as well, does have the effect that dependencies betweenresponses must be introduced, which would make an interpretation of the dataless straightforward. However, we will also analyse the accuracy data (correct/incorrect scores) without trying to formally model these, as described later on.

It should be emphasised that these models are not so much intended to be‘‘new’’ cognitive models. Instead, they are intended to be the simplest modelsthat incorporate learning and that can inform us about learning aspects in thecurrent data, with or without a lag effect (for models 2 and 1, respectively).

EXPERIMENT 1

Administering the RPM itself to test the learning effect yields a number ofproblems. First, the items of the RPM are not designed to have a balanced orsystematic representation of the rules, nor are they designed to test the kind ofhypothesis we want to test. Second, the solution process in the RPM entails morethan just finding the correct rule, whereas our focus is only on the latter process.For example, in the RPM, once the rule is found it needs to be applied correctly.For these two reasons, we will devise a test with a modified response design inorder to concentrate on the aspects that are most relevant for our purpose. Thismakes the model more realistic for the data whereas the essential learningaspects of solving the RPM still apply. Similar modifications were made byLovett and Anderson (1996) (also Lovett & Schunn, 1999) to the water jars taskin order to study that particular task more clearly. The four modifications in ourcase are the following.

(1) In the real RPM items, a rule should be found for two rows, whichshould then be applied to the third row (see Figures 1, 2, and 3). In the items ofthis experiment, on the other hand, the participant is given only one row, inwhich the correct rule is to be found. The advantage is that the emphasis ofsolving an item is directed toward finding the correct rule.

(2) We will ask participants to talk aloud while thinking, so that we canknow what the first activated rule is. Ericsson and Simon (1984) have arguedthat as long as participants are questioned during the problem-solving process,the talk-aloud technique is a useful one to find out how people solve complextasks (see also Veenman, Elshout, & Groen, 1993). On the other hand, DeShon,Chan, and Weissbein (1995) have shown that concurrent verbalisation may have


a (small) effect on the performance of some RPM items. Nevertheless, manystudies have profitably used the concurrent speech procedure (e.g., Carpenter etal., 1990; Kotovsky & Simon, 1973), and it is, in our opinion, one of the mostdirect ways to discover how participants solve such items.

(3) The RPM test is given either without a time limit (e.g., Jensen, 1987) orwith a general time limit over items (usually 40 minutes, e.g., Raven, 1962). Inthe present test, on the other hand, a limit of 30 seconds per item was imposedfor purposes of standardisation.

(4) If the item is solved correctly in time, the experimenter indicates thatthe item was solved correctly (i.e., ‘‘That’s right, go on’’). If the item is notsolved or not solved correctly in the allotted 30 seconds, the experimenter tellsthe participant the correct rule. This way, all participants can be assumed toreceive the same information after every item, either from solving the itemthemselves or from the solution given by the experimenter. This will be calledan explicit feedback procedure. More specifically, we can assume that for allitems i of type k it holds that xik = 1.

Method

Description of the items. The test consists of 5 series of items. The idea hereis to gradually introduce the different possible rules in different amounts for twoconditions. This allows us to investigate the role of pre-exposure to the differentrules.

Series 1 consists of three types of items, denoted Unique items, Commonitems, and Addition items. An example of a unique item is given in Figure 4(a).This is denoted a unique item because the third element keeps the unique parts(lines) from elements 1 and 2. This is the same rule as the one for the interiorlines of the item in Figure 3, as discussed earlier in the text. Similarly, in acommon item of series 1, only the common parts of elements 1 and 2 appear inelement 3. This is analogous to the rule for the exterior lines of the item inFigure 3. Finally, we have addition items, where all lines of elements 1 and 2 areadded in element 3. Series 1 contains 10 items, namely, two addition, fourunique, and four common items. Common and unique items appear more often(four exemplars each) because they are used in the critical series 5 (see dis-cussion later). Items of series 1 are used to familiarise the participant with therules addition, unique, and common.

A typical series 2 item is given in Figure 4(b). Here, a distinction should bemade between the outer and the inner lines. On the outside, the unique ruleholds, while on the inside, the common rule is valid. Items where the inside/outside distinction is relevant will be coined inside items. Moreover, the item inFigure 4(b) will be called an inside-common-unique item because rule commonis necessary on the inside, unique on the outside. With this terminology, the itemof Figure 3 is an inside-unique-common item because unique is the appropriate


Figure 4. Examples of an item in series 1 (a), series 2 (b) and (c), and series 3, 4, or 5 (d) inExperiment 1.

530

rule on the inside, common on the outside. Note, however, the important dif-ference between the item of Figure 3 and the item in Figure 4(b) in that the latterstrongly prompts the inside/outside distinction by the separation of the inner andthe outer parts, while the same is not true for the item of Figure 3. The idea hereis that the items of series 1 and 2 activate the rules needed in the later series,where the items are more difficult.

A second item of series 2 is given in Figure 4(c). Here, the upper/lowerdistinction is the relevant one, as is clearly indicated in the item by theseparation of upper and lower parts. We will denote such items as above items.The reader can determine that, analogously to what has been mentioned before,this is a above-common-addition item. Note that the middle horizontal linesshould be treated as upper lines, rather than as lower lines; the choice, of course,is arbitrary, but they are consistently treated as upper lines throughout the test.Series 2 consists of 10 items, namely, 5 above and 5 inside items.

The focus of our analysis is on the rules inside versus above. However, a pureabove or inside item is not possible, and these items are therefore alwayscompleted with two rules for the two parts. These completing rules are of thetype unique, common, or addition (e.g., the first item of series 2 is inside-addition-unique, the second inside-addition-common, and so on).

Items of series 3, 4, and 5 look similar to one another. A typical item isgiven in Figure 4(d). This is an inside-unique-common item, but the differ-ence with series 2 items is that the correct distinction is not indicated by agap separating the interior and exterior, or the upper and lower parts of theitem. Hence, these items are analogous to the difficult items in the real RPM,in which the correct rule is not easy to find, but in which one profits from therules used earlier on in the test. Series 3 and 4 each contain 10 items, series 5contains 12 items. How many items of the different types there are in eachseries is discussed next.

Design. There are no inside or above rules in series 1. Each of the rulesinside and above is presented five times in Series 2. Series 3 and 4 are mostcritical. In condition 1, participants receive a large number of inside items inseries 3 and 4. In condition 2, they are given a large number of above items inseries 3 and 4. More specifically, participants in condition 1 receive 8 insideitems and 2 above items for the total of series 3 and 4 (and 10 filler items, itemsin which neither rule is necessary), while persons in condition 2 receive 2 insideitems and 8 above items for the total of series 3 and 4 (and 10 filler items). Thesefiller items are always (complete) unique items and are the same for the twoconditions.

Series 5 consists of an equal amount of inside and above items, six itemseach, in each condition. The crucial prediction is that people of condition 1 willmore often think of applying the inside rule when solving the items of series 5,whereas people of condition 2 will more often think of the above rule.


For the analysis and modeling of the data, we will consider three kinds ofrules: inside, above, and a rest category. The rest category contains all otherresponses given by participants. Hence, the variable Ypi can take on three values(inside, above, and ‘‘other’’).

Participants. Six persons participated in each condition (so N = 6 6 2 =12). Each received a small amount of money for participation.

Procedure. The test is a computerised one. After the instructions are given,participants solve series 1 to 5 (each separated by a short break) by thinking andtalking aloud in a microphone. If the item is solved (i.e., the rule is found) within30 seconds, the experimenter notes that the item was correctly solved (by tellingthem ‘‘That’s right’’), and the participant is asked to press the space bar toproceed to the next item. If the item is not solved in the allotted 30 seconds, therule involved in the item is explained. Then, the participant is asked to press thespace bar in order to go to the next item. The responses are written downafterwards by listening to the audiotape.

Data analysis. Four types of analysis will be presented. First, since wereport both choice data (e.g., Equation 1) and accuracy data, we first checkwhether it is useful to analyse choice and accuracy data separately. Therefore,we calculate the chi-square test for association (Siegel & Castellan, 1989,p. 111) between the binary variables ‘‘Is the item solved correctly?’’ (i.e.,success on a particular item, or accuracy) and ‘‘Is the first rule chosen thecorrect one?’’ Remember that we model the very first choice on any particularitem. Hence, if choice data and accuracy data contain the same information, thecorrelation between these two variables (accuracy and accuracy of the firstchoice) should be close to 1. We calculate the correlation between these twovariables as a descriptive index of the amount of association between choice andaccuracy. This analysis will be called the association data analysis.

Second, we report the results for the accuracy data. Here, we investigate thecondition effect by studying the interaction between item type (inside or aboverule) and condition with accuracy as a dependent variable.

Third, we report whether participants in different conditions consider dif-ferent rules on their first confrontation with an item. This will be done bychecking the main effect of the frequency of inside responses. The frequency ofabove responses is not included in this analysis since it is almost linearlydependent on the number of inside responses. Hence, taking (inside, above) as awithin-subject variable would be problematic because of the very strongdependence between the number of inside and above responses. Since inside andabove are almost linearly dependent (almost because one can also choose an‘‘other’’ response; see Table 2 and its discussion later), a preference of condition1 for inside relative to condition 2 should show up as a main effect of the


condition with number of inside choices as a dependent variable. This analysismay be misleading in case the number of ‘‘other’’ responses differs greatlybetween conditions. However, this number is low and exactly equal for bothcondition (see Table 2 and its discussion). This analysis will be referred to as thechoice data analysis.

Note that in principle the choice and accuracy data provide different types ofinformation, because we investigate the first choice upon presentation of an item(although there may be a correlation between first choice accuracy and finalaccuracy, as discussed earlier in the first paragraph of Data analysis). In case wewould investigate the last choice, these two types of information would beredundant. The fourth type of analysis is discussed in the following subsection.

Model-based analysis. As noted earlier, the equations (1) and (2) onlyincorporate information from the past, not from the item itself. This is quiteunrealistic for items of series 1 and 2, since the rule is easily ‘‘seen’’ there,based upon the information in the item. On the other hand, for items of series 3to 5, it may very well be plausible that stimulus information (from the presentitem) does not influence the rule sampling probabilities; as the reader can check,it is difficult to use stimulus information in these items to find the correct rule.Hence, the model as given in (1) and (2) will be used to analyse items of series 3,4, and 5 only. Nevertheless, feedback obtained in series 2 will be incorporatedinto the model via the factor

Pi¡1jˆ1 xjk (in equation 1) or via the factorPi¡1

jˆ1 xjk…i ¡ j†g (in equation 2) since the index j is taken to start from the firstitem in series 2.

Data will be analysed for each condition separately, so there are (10 + 10 +12) 6 6 = 192 data points per analysis. Parameters will be estimated bymaximising the likelihood function, and the standard error of each parameterwill be estimated.2 Further, for each model the Akaike Information Criterion(AIC; Akaike, 1974) will be calculated, which is defined as AIC = –2 6 ln(L) +2M, where L denotes the likelihood function (evaluated in the estimated para-meters) and M the number of freely estimated parameters. It is common to selectthe model with the lowest AIC value as the best fitting one. Further, since themodels (1) and (2) are nested (model 1 equals model 2 with g restricted to zero),it is possible to test the relative fit of the two models. Specifically, in the presentcase, the variable X = –2 6 ln(L1) – [–2 6 ln(L2)] should follow a chi-squaredistribution with one degree of freedom under model (1), where L1 and L2

denote the likelihood functions for model (1) and (2) respectively. A high valueof X is indicative of the presence of a lag effect (that is, g = 0).

2 Standard errors can be estimated by evaluating the (Hessian) matrix of second-orde r derivativesat the maximum likelihood estimators. The square root of the diagonal values of minus the inverse ofthis matrix provide an estimate of the standard errors (e.g., Schervisch, 1995).


Results

Association data. In condition 1, the Pearson chi-square value forassociation between accuracy on the first response and (final) accuracy equalsX 2 = 9.852, p = .002. This is statistically significant, but the correlation equalsonly .273. Hence, it seems warranted to study choice and accuracy separately. Incondition 2, we obtain X 2 = 0.738, p = .394, with a correlation of –.075. Hence,a separate analysis is required here as well.

Accuracy data. Table 1 shows the mean accuracies for both response typesand both conditions. The interaction between condition and item type issignificant, F(1, 10) = 12.500, p = .005.

Choice data. Table 2 shows, for all participants and both conditions, thenumber of inside, above, and other responses. It can be seen that participants incondition 1 prefer response inside, whereas participants in condition 2 preferresponse above. The number of other responses is low and is equal for bothconditions (4/72). An independent samples t-test on the inside responses yieldsan effect for the condition, t(10) = 4.969, p = .001.

TABLE 1Proportions of success (accuracy data) in

series 5, Experiment 1

Condition 1 Condition 2

Inside Above Inside Above

.94 .72 .89 .94

TABLE 2Frequencies of responses (choice data) in series 5, Experiment 1


Participant Inside Above Other Participant Inside Above Other

1 9 2 1 1 3 7 22 10 2 0 2 3 8 13 7 5 0 3 4 8 04 8 4 0 4 6 6 05 8 4 0 5 5 7 06 7 2 3 6 6 5 1


Model-based analysis. A first point of interest is whether the lag parameteris needed: Do people just add all information from previous items, or do theyweigh previous items according to recency? To investigate this, we calculatedthe AIC value for models (1) and (2) for the condition 1 data. These are 331.428and 329.244 respectively, suggesting that the lag model performs better. Further,since the models are nested, it is possible to statistically test the necessity of thelag parameter as described earlier. The corresponding statistic X equals 4.184, p= .041. In condition 2, the AIC values are 385.800 and 364.146, for models (1)and (2), respectively. Further the statistic X equals 23.654, p < .001. Hence, inboth conditions a lag parameter seems to be needed. In the following, theestimated parameters of this lag model will be considered.

Table 3 shows the estimated parameters of the lag model together with theirestimated confidence intervals, for each condition separately. Several things areto be noted about this table. First, regarding the initial strength parameters l,note that we have restricted one of these for purposes of identification. Second,the experimental analysis already indicated that there is a learning effect in thedata. The present model-based analysis suggests that learning occurs for allparticipants, since no b confidence interval contains zero. Furthermore, theanalysis shows that there are also individual differences in learning speed, sincesome estimated b parameters are outside the confidence intervals of other (b)parameters. However, comparison of the magnitudes of the b values over con-ditions is not meaningful since the g parameter is very different across condi-tions, which is discussed next. Third, the g parameter has a positive value incondition 1, which seems to indicate that earlier items have a stronger impact

TABLE 3Estimated parameters and confidence intervals for lag model, Experiment 1


Parameter Estimate 95% CI Parameter Estimate 95% CI

l1 –0.655 (–1.023, –0.287) l1 –1.026 (–1.377, –0.675)

l2 –1.072 (–1.511, –0.633) l2 –1.043 (–1.351, –0.735)

l3 0* (0, 0)* l3 0* (0, 0)*

b1 0.012 (0.006, 0.018) b1 0.472 (0.125, 0.819)

b2 0.028 (0.012, 0.044) b2 0.961 (0.557, 1.365)

b3 0.019 (0.009, 0.029) b3 1.347 (0.835, 1.859)

b4 0.017 (0.007, 0.027) b4 1.317 (0.815, 1.819)

b5 0.012 (0.006, 0.018) b5 0.822 (0.444, 1.200)

b6 0.006 (0.002, 0.010) b6 1.075 (0.644, 1.506)

g 1.006 (0.928, 1.084) g –0.872 (–1.001, 70.743)

* Restricted.


than more recent items. This seems counterintuitive, but it is perhaps bestexplained by the fact that the very first two items over which the index j inequation (2) ranges, are of type inside, and so the model seems to pick up theinfluence of these two items. In condition 2, on the other hand, the parameter g isnegative and thus a recency effect occurs in the sense that more recent itemshave a larger influence than earlier items.

Discussion

We have found a reasonably strong learning, or mental set phenomenon in thesesimplified RPM items using rules inside and above, in both choice and accuracydata. Both conditions of the experiment had received both rules earlier (7 and 13times for the infrequent and frequent rule respectively), so the effect cannot bedue to the fact that people in each group knew only one of the particular rules(inside or above in condition 1 and 2 respectively). Our finding is that, if onerule is used more often than the other one, the first one will be considered morefrequently than the other (but not always) later on. An account of this finding interms of activation of the rules as we have given in this paper, seems a plausibleone. Moreover, it was found that earlier items may be weighted differently thanmore recent ones.

It is useful to mention that this experiment rules out the possibility that peoplelearn in the RPM test simply because they get used to the general test format. Ifthat were so, then it would not matter exactly which rules people would be trainedon, and no main effect in the choice data or interaction in the accuracy data wouldhave been found. The fact that there is an effect indicates that learning in this testis rule-specific, that is, it is due to getting used to using specific rules.

A possible drawback of the current experiment is that the items and theprocedure differ from the RPM. We will concentrate on two points. First, thetestee always receives explicit information about whether or not the item wassolved correctly. This is different from the procedure in the real RPM: There,one solves the items in silence without ever receiving explicit feedback aboutthe correctness of the chosen alternative. Nevertheless, we think that in the RPMtest as well, partial feedback is operating, namely for items where a correctresponse was given. Indeed, choosing a certain rule to solve the item and findinga response alternative that matches this rule provides the information that therule chosen is the correct one. The idea is that, if one chooses an incorrect rule,the probability is low that one also finds a matching response alternative below.Hence, we believe that the results of Experiment 1 transfer to the RPM situationsince in case the response is correct, implicit feedback is given about thecorrectness of the rule (note that the presence of feedback is critical in learningthe rules). Still, it remains to be seen whether our results generalise to situationswithout explicit feedback.


Second, the items in Experiment 1 were chosen such that it was very difficultto determine, from a quick glance at the item, which rule (inside or above) is therelevant one. This made model development quite easy, since no perceptualcomponents were required in the model to account for the effect features of theitems may have. However, the items in the RPM often give a clue as to whichrule is the relevant one, so that our ‘‘no perceptual cues’’ assumption may not bevalid for real RPM items. The models (1) and (2) cannot adequately handle thissituation.

More generally, the items of Experiment 1 were inspired by the RPM test, butthey were different in important ways. In Experiment 2, we will start from the 36real RPM items and only make small adjustments to these items if necessary.More specifically, concerning the two points noted earlier, no explicit feedback isgiven in the new items and some of the items hint clearly as to which rule shouldbe used. We expect a similar pattern of results in this new situation as in theprevious experiment, even though the testing situation is a more complex one.

EXPERIMENT 2

Here, we start from the task analysis performed by Carpenter et al. (1990). Theseauthors have distinguished five rule types that are used in different RPM items:distribution of 3, distribution of 2, progression, constant in a row, and addition.The addition rule was divided in three different rules in the previous experiment,namely, addition, unique, and common. Almost all RPM items can be describedusing this set of principles. The constant rule will not be used in our experiment.The other rules not mentioned before will be explained in Description of theitems later on.

Starting from the RPM, we make two conditions, one in which one set ofrules is learned (e.g., addition, unique) and the other in which the com-plementary set of rules is learned (e.g., distribution of 3, progression) . This isdone by adapting real RPM items as needed, see Method later. After bothgroups have received their (different) set of items, we present items of bothtypes (e.g., both addition and distribution of 3 items) to both groups. Then, wepredict that there will be an effect on the dependent variables choice of ruleand probability of success as before. Specifically, each group will chooseprinciples they have used before and will be better on the items that followthese principles.

Since in this case the items differ largely as to their complexity, and hence, inhow easy it is to see the rule, an item complexity factor (or perceptual factor)will be included. This will be done by adding to equations (1) and (2) a factorthat takes into account complexity effects. We extend model (1) in the followingway:


Pr…Yi ˆ k† êxp lk ‡ bp

Pi¡1

jˆ1xpjk ‡ aI…k; i†

" #

PK

mˆ1exp lm ‡ bp

Pi¡1

jˆ1xpjm ‡ aI…m; i†

" # …3†

Since we do not give explicit feedback in the present experiment, the variablexpik is now made person dependent, referring to whether item i was of type k andperson p solved it correctly. The indicator variable I(k, i) takes the value of 1 ifonly rule k has to be applied and that rule has to be applied only once in the itemin question. For example, if rule 1 is applied once in item i and there are no otherrules involved in this item, then I(1, i) = 1 and I(k, i) = 0 for k = 1. If rule 1 isapplied twice in item i then I(k, i) = 0 for all k. If rule 1 and rule 2 are bothnecessary in item i, then again I(k, i) = 0 for all k. The idea here is that, if manyrule tokens appear in the same item, the correct rule is less easily noticeable, sothe probability of response k is lower if other rule tokens are present. Hence, thisextra factor introduces a kind of perceptual component into the model, which isscaled by the parameter a. Suppose that a is larger than zero: Then, if I = 1, theprobability of success will be higher than if I = 0. Admittedly, the indicatorvariable I does not refer to explicit item features hinting at the correct rule, butthe feature effects are nevertheless modelled indirectly with this procedure.Indeed, our focus is on the learning effects (b) so we thought it would not benecessary to analyse the item features in too much detail.

To introduce the lag effect, model (2) is extended as follows:

Pr…Yi ˆ k† êxp lk ‡ bp

Pi¡1

jˆ1xpjk…i ¡ j†g ‡ aI…k; i†

" #

PK

mˆ1exp lm ‡ bp

Pi¡1

jˆ1xpjm…i ¡ j†g ‡ aI…m; i†

" # …4†

in complete analogy with equation (3).

Method

Description of the items. We will now discuss the Carpenter et al. (1990)RPM rule system in more detail and discuss the adaptations we made for ourtest. We made those adaptations based on informal assessments concerning thelevel at which participants generally describe a rule. For example, if two rules R1

and R2 are not usually distinguished by participants, they are treated as one rule.However, the exact formulation of a rule is not too critical since participantswere pre-exposed to either one of two sets of rules, and two rules of different


sets were always conceptually very different. This will become more clear lateron.

The distribution of 3 (D3) rule involves the fact that three figures (e.g., acircle, a square, and a triangle) appear in the three elements of every row: One ofthese figures appears in each element. For example, if the first row consists ofthe sequence ‘‘circle-square-triangle’’, the second row of the sequence‘‘triangle-square-circle’’, and the third row of the sequence ‘‘square-circle-triangle’’, this might be an instantiation of the D3 rule. The distribution of 2(D2) rule means that the same figures appear in only two out of three elements,as in the sequence ‘‘square-triangle-Ø’’, where Ø denotes that no figure isshown. The D2 and D3 rules will not be distinguished in the following analysisand will (together) be referred to as the D3 rule. Indeed, the element ‘‘Ø’’ maybe considered a third figure, so that formally speaking D2 and D3 can be seen asequivalent.

Another type of rule is progression, and involves the fact that one of thefigures is undergoing some kind of transformation throughout the row (e.g., itbecomes progressively smaller). This rule is divided in two separate rules,rotation and progression (where ‘‘progression’’ covers all Carpenter et al.(1990) progression instances except rotation). Finally, we have the rules thatwere already used in the previous experiment, namely, addition, unique, andcommon, the last two of which are treated as one rule by Carpenter et al. (1990).Hence, we will work with the set of rules {addition, common, unique, pro-gression, rotation, D3}. See examples Figures 1, 2, and 3; Figures 1 and 2 showactual items of the newly constructed test, and Figure 3 shows an item that isvery similar to an actual item (see also the discussion of these items earlier in thepaper).

Design. Items of condition 1 are governed by the following three principles:progression, rotation, or D3. If the ith item of the (original) RPM test is aprogression, rotation, or D3 item, then this item is taken as the ith item ofcondition 1. If none of these three rules were needed in the original RPM item, anew item was constructed from the same figural parts as item i, but governed byone of these three rules.

Similarly, condition 2 items are governed by the principles addition, com-mon, and unique. Also, if the ith item of the (real) RPM belongs to this class, it isincorporated as the ith item of condition 2. If none of these three rules are needed(in the original RPM item), a new item is constructed from the same figural partsas item i, but governed by the addition, common, or unique rule. If an item isgoverned by rules from both the sets {progression, rotation, D3} and {addition,common, unique}, the item is ‘‘split in two’’, such that each splitted itemconsisted of rules of one of both sets only. The splitted items are then assigned tothe corresponding condition.


All participants are first given 34 condition-specific items. Afterwards, thesame four items are presented to every participant. Two of these four are builtusing the unique and common rules, whereas the other two items follow the D3and rotation principles. The unique and common items are the items 35 and 36of the (original) RPM test. The other two items were constructed by ourselves,following the principles discussed earlier. One of these is the rotation itemwhich is shown in Figure 2. We expect that participants should be better on theitems governed by rule types for which they have been activated.

Two matching items of condition 1 and 2 are always built from the samefigural elements. In condition 1, there are 17 items with single rule instan-tiations, i.e., items such that I(k, i) = 1; in condition 2, there are 34 such items(the four common items are included in this count). This asymmetry is the pricewe pay for staying as close as possible to the real RPM items.

Participants in condition 1 have been primed toward the progression, rotation,and D3 rule, the rules which are involved also in items 36 and 38. Participants incondition 2, on the other hand, have received training in the addition, common,and unique rules, which are involved also in items 35 and 37. Hence, we predictthat participants in condition 1 are better on items 36 and 38 (than participants incondition 2), whereas participants in condition 2 are better on items 35 and 37(than participants in condition 1). Hence, we again predict an interaction effect.In a similar vein, we expect a main effect in the choice data.

The set effect of items 35 and 36 is not necessarily equal to that in items 37and 38. The difference is that, while solving items 35 and 36, the problem-solving set is probably stronger than that in items 37 and 38. This is because,while solving the items 35 and 36, the participants have seen only 0 or 1 item(s)that do not conform to the created set, whereas they have seen 1 or 2 item(s) donot conform to the set while solving items 37 and 38. Indeed, the problemsolving set literature indicates that presenting even one non-set item can beenough to eliminate its effect (e.g., Luchins & Luchins, 1954). Therefore, wewill investigate the expected interaction for items 35 and 36 and items 37 and 38separately.

Participants. Eight persons participated in each condition (so N = 8 6 2 =16). Each of these received course credit.

Procedure. Participants received a computerised test with either the 34items of condition 1 or the 34 items of condition 2, followed by the same lastfour items, as described previously. In order to choose a response alternative,participants had to move the mouse arrow to a response alternative and click it.Then they move the arrow to a button that says ‘‘Next’’. Clicking this buttonbrings up the next item. Participants were selected to have at least a minimal‘‘mouse clicking ability’’. Contrary to the previous test, the items were not


presented in series; that is, they are presented one after the other without breaks.Also in contrast with the previous experiment, the experimenter neverintervened during the test taking. Finally, no time limits were imposed sincethere is no (item specific) time limit in the real RPM test either.

Each solved 38 items as described earlier. Participants were required to dotwo things: Think aloud about the solution and click the answer alternativethey think is the correct one. The former process is recorded on a tape recor-der. Furthermore, the computer recorded which completion response waschosen.

Data analysis. Again, four types of analysis are performed. First, theassociation data results are calculated as before. Second, for the accuracy data,define A1 to be the number of successes on items 35 and 37 together (a scoreranging from 0 to 2), and A2 to be the number of successes on items 36 and 38 (ascore from 0 to 2). Then, we predict an interaction between the (within-subject)variable (A1, A2) and condition. We can also investigate the effect for items 35and 36 only, or for items 37 and 38 only. This makes sense because items 35 and36 are the ones appearing immediately after the set-inducing items (1, . . . , 34).Items 37 and 38 appear only later, after the set is possibly broken. Since in thelatter case we have a binary dependent variable, it is not very appropriate to usean ANOVA. Therefore, we will also use a permutation test (in addition to theANOVA). Let us say the test is to be performed for items 35 and 36. (Theanalysis for items 37 and 38 is similar.) We first calculate the statistic T = n1,35

6 n2,36 /(n1,36 6 n2,35), where nk,i denotes the number of 1s in condition k onitem i. The total number of 1s and 0s is fixed and a random permutation of thedata is generated. This procedure is repeated 1000 times, and the proportion ofTs calculated in the replicated data that is smaller than the observed T is theresultant p-value.

Third, for the choice data, we will only consider the very first rule responseon every item, as in the previous experiment. The range of possible ruleresponses is potentially much larger in this experiment, since we did notintervene during the testing process. We restrict our attention to seven possiblerules: the six rules mentioned previously, plus a rest category, which applies forall remaining responses.

We calculate, per item, the number of responses coming from set 1, that is,from the set {progression, rotation, D3}. Similary, set 2 contains the rules{addition, common, unique}. Again, there is an almost linear dependencebetween the number of rules chosen from set 1 and 2, so we do not incorporateset 2 in the analysis and simply look at the main effect of condition on thenumber of responses from set 1 per item. These numbers can be aggregated(over the items 35, 36, 37, and 38), resulting in a score of 0 to 4 per person.Finally, the fourth type of analysis is discussed in the following paragraph.


Model-based analysis. We will estimate the parameters and theirconfidence intervals for models (3) and (4) in the way described earlier.However, since there are more possible responses here than in the firstexperiment, and since these data indicated that initial strength estimation isunstable, the initial strengths of all rules except the ‘‘remainder’’ category wereassumed to be equal. Hence, there are now two initial strength parameters to beestimated. Concerning model fit, we again calculate the AIC for both modelsand investigate their relative fit by comparing their respective values of –2 ln(L)with the statistic X introduced earlier.

Results and discussion

Association data. For condition 1, the Pearson association measurebetween final success and performance on the first response on an item equalsX 2 = 28.34, df = 1, p < .001. The correlation equals r = .305. In condition 2,X 2 = 100.38, df = 1, p < .001, r = .575. Hence, the accuracy on the firstresponse is related to success on the item, but the association is certainly notperfect.

Accuracy data. Table 4 shows the accuracies for both conditions and allfour items common to condition 1 and 2. As noted earlier, A1 is the variabledenoting the number of successes on items 35 and 37 per person. A2 is theanalogous variable for items 36 and 38. The interaction between the variables(A1, A2) and condition is not significant, F(1, 14) = 1.000, p =.334. However, ifthe effect is investigated for items 35 and 36 only, this does result in a significanteffect, F(1, 14) = 7.000, p = .019. The corresponding permutation test yields p =.047. For items 37 and 38, the ANOVA yields, F(1, 14) = 0.226, p = .642, andthe permutation test p = .233. Hence, the effect seems to be present in items 35and 36 but not in 37 and 38. The discrepancy is not unexpected, as we analysedthe items 35, 36 and 37, 38 separately, because ‘‘set-breaking’’ items haveappeared at the moment when the latter two are being solved. It appears that oneor two items not conforming to the rule may break a set. Such an effect,

TABLE 4Proportions of success (accuracy) aggregatedover persons, on items 35±38, Experiment 2

Item

35 36 37 38

Condition 1 .375 .875 .125 .625Condition 2 .750 .750 .375 1.00


however, was not present in Experiment 1. We discuss this discrepancy morefully in the General Discussion.

Choice data. Table 5 shows the number of times that a first response wasfrom the set {progression, rotation, D3} (denoted ‘‘condition 1 rules’’ in Table 5because condition 1 was pre-exposed to these) or from the set {addition,common, unique} (denoted ‘‘condition 2 rules’’). The number of ‘‘other’’responses is higher than in Experiment 1, but about equal for both conditions(10/32 and 8/32 for condition 1 and 2 respectively). Define S1 to be the numberof responses from the set {progression, rotation, D3}. Then, with S1 as adependent measure, an independent samples t-test for condition 1 versuscondition 2 yields t(14) = 2.729, p = .016. Since there is a difference for items 35and 36 on the one hand and 37 and 38 on the other in the accuracy data, it isworthwhile to study this difference here also. Hence, S1 is restricted to items 35and 36. This results in t(14) = 2.017, p = .063. For items 37 and 38, t(14) =2.546, p = .023. Hence, for the choice data the effect seems to be present forboth item sets {35, 36} and {37, 38}.

Model-based analysis. First of all, models without the a parameters have amuch worse fit than either model (3) or (4) (with an a parameter), and will notbe discussed here.

In condition 1, the AIC values for models (3) and (4) are 772.092 and771.802 respectively. Hence, according to this criterion, the lag model should bechosen again. However, the statistic X equals 2.290, p = .130, so the addition ofthe lag parameter is probably not that important in this condition. In condition 2,the corresponding AIC values are 756.996 and 753.416. The statistic X reaches avalue of 5.580, p = .018. So the conclusion is here again that it makes sense tointroduce the lag parameter.

Finally, we discuss the parameter estimates of the lag model, displayed inTable 6. As in the previous experiment, learning occurs and there are individual

TABLE 5Frequencies of responses (choice), aggregated over persons, on items

35±38, Experiment 2

Item

Condition 1 rules{Progression, Rotation, D3}

Condition 2 rules{Addition, Common, Unique}

35 36 37 38 35 36 37 38

Condition 1 4 6 3 8 0 0 1 0Condition 2 0 5 0 6 6 1 4 2


differences in learning, as is evident from the fact that some of the b parametersare not contained in the confidence intervals of some other bs. Further, few bconfidence intervals contain the value of zero, suggesting that these people didnot learn at all in this test. Also, in this experiment, the lag parameter g is in bothconditions negative, which is indicative of a recency effect.

GENERAL DISCUSSION

In this paper, we have analysed data of participants solving RPM-like items.Specifically, we have looked at whether people profit from rules used earlier inthe test and whether a lag effect occurs.

In Experiment 1, it was shown that people will try out a rule with higherprobability if it has been activated in previous items than when it has not. Theprobability of success is influenced similarly. In Experiment 2, a test whichclosely followed the RPM format was constructed. It was shown that thelearning effect was present here too, but less strongly so than in Experiment 1.Indeed, the effect was no longer visible in the accuracies of items 37 and 38,although the choice data of these items still showed the expected trend. Thissuggests that the set effect is still present, but more weakly so. So why is the setso easily broken (or weakened) in Experiment 2 but not in Experiment 1? Onepossible reason is that explicit feedback was provided by the experimenter inExperiment 1 but not in Experiment 2. Participants are probably more confidentabout rules that are provided by the experimenter than about rules they have

TABLE 6Estimated parameters and confidence intervals for lag model, Experiment 2


Parameter Estimate 95% CI Parameter Estimate 95% CI

l1 –0.229 (–0.501, 0.043) l1 –1.068 (–1.327, –0.809)

l2 1.889 (1.617, 2.161) l2 1.358 (1.099, 1.617)

b1 0.467 (0.304, 0.630) b1 0.605 (0.384, 0.827)

b2 0.225 (0.090, 0.360) b2 0.377 (0.152, 0.602)

b3 –0.040 (–0.311, 0.231) b3 0.062 (–0.252, 0.376)

b4 0.484 (0.276, 0.692) b4 0.513 (0.313, 0.713)

b5 0.192 (0.031, 0.353) b5 0.542 (0.285, 0.800)

b6 0.025 (–0.148, 0.198) b6 0.476 (0.286, 0.666)

b7 0.650 (0.427, 0.873) b7 0.310 (0.038, 0.582)

b8 0.432 (0.189, 0.675) b8 0.083 (–0.191, 0.357)

g 3.180 (2.782, 3.578) g 2.523 (2.270, 2.776)

a –0.227 (–0.311, –0.143) a –0.440 (–0.534, –0.346)


discovered themselves. Hence, if such a self-discovered rule no longer appearsto apply, participants may reject this rule, or at least consider other rules as well,after the initial consideration of the earlier rules. This is consistent with ourfinding that, for items 37 and 38, the set effect in Experiment 2 is present in thechoice data but not in the accuracy data. Although these data are too limited inscope to allow generalisations, this result suggests that at least in some casespeople may profit more from rules that are provided explicitly rather than foundby themselves because the former rules are used more confidently. It remains anopen question whether this holds in other problem solving tasks as well.

We have started this paper by mentioning the research of Carpenter et al.(1990). Our findings seem to support their assumption: A small set of rules isrepeatedly applied (over items) by participants. Moreover, they become morefluent over repeated applications. Also, in three (out of four) conditions,more recent items were more active than earlier ones. In the fourth condition,earlier items seemed to have a stronger effect than later ones, although thismay have been due to the artefact that the model tried to pick up theinfluence from the very first two items (see the discussion at that point inExperiment 1).

Carpenter et al. (1990) also suggested that two factors are important in sol-ving the RPM test: Ability to induce abstract relations (rules) and workingmemory capacity. Earlier we (Verguts et al., 1999) have conceptualised the ruleinduction process as a sequential sampling of rules where each rule has a certainprobability to be sampled. In that paper, we have investigated one source ofindividual differences in this rule-finding process, namely, the speed at which aparticipant can sample rules. In the present paper, we have investigated previousexperience as a factor which influences the sampling probabilities.

In the intelligence literature, there is a research line that considers dynamictest situations, that is, test situations in which people learn something whilesolving the test (Ferrara, Brown, & Campione, 1986; Ferretti & Butterfield,1992). These authors advocate the study of the role of transfer, or learning, inintelligence. Specifically, they found relations between IQ and number of hintsneeded to apply an earlier used principle to later items (i.e., transfer), also in thetype of material we considered in this paper (RPM items). Ferrara et al. andFerretti and Butterfield defend the position that a large part of intelligence ispossibility of transfer, an aspect that has been neglected, possibly because it isdifficult to assess. The same aspect, possibility of transfer, was incorporated inour model in the learning rate parameter b. It was suggested by our findings thatthere might be individual differences in this learning rate parameter. However,we did not investigate the relation of this learning rate variable to othermeasures, such as IQ or working memory capacity (Kyllonen & Christal, 1990).Investigation of such relations is a psychometric issue and stands out as a futureconcern. The main issue in the present paper was whether, and how, people


recycle rules over items, with the aim of understanding more clearly the itemsolution processes involved here.

Manuscript received March 2001Revised manuscript received August 2001

REFERENCES

Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions onAutomatic Control, 19, 716–723.

Alderton, D.L., & Larson, G.E. (1990). Dimensionality of Raven’s advanced progressive matricesitems. Educational and Psychological Measurement, 150, 887–900.

Arthur, W., Jr., & Day, D.V. (1994). Development of a short form for the Raven advanced pro-gressive matrices test. Educational and Psychological Measurement, 54, 394–403.

Carlson, J.S., Jensen, C.M., & Widaman, K.F. (1983). Reaction time, intelligence, and attention.Intelligence, 7, 329–334.

Carpenter, P.A., Just, M.A., & Shell, P. (1990). What one intelligence test measures: A theoreticalaccount of processing in the Raven progressive matrices test. Psychological Review, 97, 404–431.

DeShon, R.P., Chan, D., & Weissbein, D.A. (1995). Verbal overshadowing effects on Raven’sprogressive matrices: Evidence for multidimensional performance determinants. Intelligence, 21,135–155.

Dover, A., & Shore, B.M. (1991). Giftedness and flexibility on a mathematical set-breaking task.Gifted Child Quarterly, 35, 99–105.

Duncker, K. (1945). On problem solving. Psychological Monographs, 58(5).Embretson, S.E. (1995). The role of working memory capacity and general control processes.

Intelligence, 20, 175–186.Ericsson, K.A., & Simon, H.A. (1984). Protocol analysis: Verbal reports as data. Cambridge, MA:

MIT Press.Ferrara, R.A., Brown, A.L., & Campione, J.C. (1986). Children’s learning and transfer of inductive

reasoning rules: Studies of proximal development . Child Development, 57, 1087–1099.Ferretti, R.P., & Butterfield, E.C. (1992). Intelligence-related differences in the learning,

maintenance, and transfer of problem-solving strategies. Intelligence, 16, 207–223.Fry, A.F., & Hale, S. (1996). Processing speed, working memory, and fluid intelligence: Evidence for

a developmenta l cascade. Psychological Science, 7, 237–241.Jensen, A.R. (1987). Process differences and individual differences in some cognitive tasks.

Intelligence, 11, 107–136.Kaplan, I.T., & Schoenfeld, W.N. (1966). Oculomotor patterns during the solution of visually

displayed anagrams. Journal of Experimental Psychology, 72, 447–451.Kotovsky, K., & Simon, H.A. (1973). Empirical tests of a theory of human acquisition of concepts

for sequential patterns. Cognitive Psychology, 4, 399–424.Kyllonen, P., & Christal, R. (1990). Reasoning ability is (little more than) working memory capa-

city?! Intelligence, 14, 389–434.Lemay, E.H. (1972). Anagram solutions as a function of task variables and solution word models.

Journal of Experimental Psychology, 92, 65–68.Lovett, M.C., & Anderson, J.R. (1996). History of success and current context in problem solving.

Cognitive Psychology, 31, 168–217.Lovett, M.C., & Shunn, C.D. (1999). Task representations, strategy variability, and base-rate neglect.

Journal of Experimental Psychology: General, 128, 107–130.


Luchins, A.S., & Luchins, E.H. (1954). The Einstellung phenomenon and effortfulness of task.Journal of General Psychology, 50, 15–27.

Maier, N.R.F. (1931). Reasoning in humans: II. The solution of a problem and its appearance inconsciousness. Journal of Comparative Psychology, 12, 181–194.

Marshalek, B., Lohman, D.F., & Snow, R.E. (1983). The complexity continuum in the radex andhierarchical models of intelligence. Intelligence, 7, 107–127.

Raven, J.C. (1962). Advanced progressive matrices, set II. London, UK: H.K. Lewis.Reed, T.E., & Jensen, A.R. (1992). Conduction velocity in a brain nerve pathway of normal adults

correlates with intelligence level. Intelligence, 16, 259–272.Ross, B.H. (1984). Remindings and their effects in learning a cognitive skill. Cognitive Psychology,

16, 371–416.Ross, B.H. (1987). This is like that: The use of earlier problems and the separation of similarity

effects. Journal of Experimental Psychology: Learning, Memory, and Cognition, 13, 629–639.Salthouse, T.A. (1991). Mediation of adult age differences in cognition by reductions in working

memory and speed of processing. Psychological Science, 2, 179–183.Schervisch, M.J. (1995). Theory of statistics. New York: Springer-Verlag.Siegel, S., & Castellan, J.N., Jr. (1989). Nonparametric statistics for the behavioral sciences. New

York: McGraw-Hill.Sweller, J., & Gee, W. (1978). Einstellung, the sequence effect, and hypothesis theory. Journal of

Experimental Psychology, 4, 513–526.Veenman, M.V.J., Elshout, J.J., & Groen, M.G.M. (1993). Thinking aloud: Does it affect regulatory

processes in learning? Tijdschrift voor Onderwijsresearch, 18, 322–330.Verguts, T., De Boeck, P., & Maris, E. (1999). Generation speed in Raven’s Progressive Matrices

Test. Intelligence, 27, 329–345.Verhelst, N.D., & Glas, C.A.W. (1993). A dynamic generalization of the Rasch model.

Psychometrika, 58, 395–415.White, H. (1988). Semantic priming of anagram solutions. American Journal of Psychology, 101,

383–399.Wiley, J. (1998). Expertise as mental set: The effects of domain knowledge in creative problem

solving. Memory and Cognition, 26, 716–730.


Documents

The induction of solution rules in Raven's Progressive ...ppw.kuleuven.be/okp/_pdf/Verguts2002TIOSR.pdf · Raven’sAdvancedProgressiveMatricestest(RPM;Raven,1962).Carpenter,Just,