13
A normative and reliability study for the RavenÕs Coloured Progressive Matrices for primary school aged children from Victoria, Australia Sue M. Cotton a,b, * , Patricia M. Kiely a , David P. Crewther c , Brenda Thomson a , Robin Laycock a , Sheila G. Crewther a a School of Psychological Science, La Trobe University, Vic. 3086, Australia b Orygen Research Centre, Department of Psychiatry, University of Melbourne, Locked Bag 10 (35 Poplar Rd), Parkville, Vic. 3052, Australia c Brain Sciences Institute, Swinburne University, Hawthorn 3122, Australia Received 2 September 2004; received in revised form 31 January 2005; accepted 22 February 2005 Available online 11 April 2005 Abstract Outcomes of a normative and reliability study on the RavenÕs Coloured Progressive Matrices (CPM) are reported for a sample of 618 children from Victoria Australia ranging in age from 6.00 to 11.92 years. Per- centile ranks are presented for six age levels. Item analysis, internal consistency, and split-half reliabilities are also described. The CPM demonstrated good inter-item consistency and split-half reliability across the age levels. However, item analysis together with the reliability analyses indicated that the number of items in the CPM could be reduced without detriment to the testÕs reliability. The psychometric properties of the CPM are discussed and the norms are compared to previously reported data for Australian children. Ó 2005 Elsevier Ltd. All rights reserved. Keywords: Matrix reasoning; Norms; RavenÕs; Reliability 0191-8869/$ - see front matter Ó 2005 Elsevier Ltd. All rights reserved. doi:10.1016/j.paid.2005.02.015 * Corresponding author. Tel.: +61 38346 8211; fax: +61 39347 9099. E-mail address: [email protected] (S.M. Cotton). www.elsevier.com/locate/paid Personality and Individual Differences 39 (2005) 647–659

A normative and reliability study for the Raven’s Coloured Progressive Matrices for primary school aged children from Victoria, Australia

Embed Size (px)

Citation preview

Page 1: A normative and reliability study for the Raven’s Coloured Progressive Matrices for primary school aged children from Victoria, Australia

www.elsevier.com/locate/paid

Personality and Individual Differences 39 (2005) 647–659

A normative and reliability study for the Raven�sColoured Progressive Matrices for primary school aged

children from Victoria, Australia

Sue M. Cotton a,b,*, Patricia M. Kiely a, David P. Crewther c,Brenda Thomson a, Robin Laycock a, Sheila G. Crewther a

a School of Psychological Science, La Trobe University, Vic. 3086, Australiab Orygen Research Centre, Department of Psychiatry, University of Melbourne,

Locked Bag 10 (35 Poplar Rd), Parkville, Vic. 3052, Australiac Brain Sciences Institute, Swinburne University, Hawthorn 3122, Australia

Received 2 September 2004; received in revised form 31 January 2005; accepted 22 February 2005

Available online 11 April 2005

Abstract

Outcomes of a normative and reliability study on the Raven�s Coloured Progressive Matrices (CPM) arereported for a sample of 618 children from Victoria Australia ranging in age from 6.00 to 11.92 years. Per-centile ranks are presented for six age levels. Item analysis, internal consistency, and split-half reliabilities

are also described. The CPM demonstrated good inter-item consistency and split-half reliability across the

age levels. However, item analysis together with the reliability analyses indicated that the number of items

in the CPM could be reduced without detriment to the test�s reliability. The psychometric properties of theCPM are discussed and the norms are compared to previously reported data for Australian children.

� 2005 Elsevier Ltd. All rights reserved.

Keywords: Matrix reasoning; Norms; Raven�s; Reliability

0191-8869/$ - see front matter � 2005 Elsevier Ltd. All rights reserved.doi:10.1016/j.paid.2005.02.015

* Corresponding author. Tel.: +61 38346 8211; fax: +61 39347 9099.

E-mail address: [email protected] (S.M. Cotton).

Page 2: A normative and reliability study for the Raven’s Coloured Progressive Matrices for primary school aged children from Victoria, Australia

648 S.M. Cotton et al. / Personality and Individual Differences 39 (2005) 647–659

1. Introduction

The Raven�s Coloured Progressive Matrices Test (CPM) has commonly been employed as anestimate of the non-verbal component of Spearman�s g-factor in research, educational and clinicalsettings, particularly in the United Kingdom (UK) and in Australia. More recently, it has beenused in several brain imaging studies that have been searching for a neural locus of fluid intelli-gence (Gray, Chabris, & Braver, 2003; Prabhakaran, Smith, Desmond, Glover, & Gabrieli, 1997).Advocates of this matrix reasoning task have argued that it is the purest measure of fluid intelli-gence particularly for children with reading or language difficulties (Carver, 1990; Stanovich,Cunningham, & Freeman, 1984), physical handicaps (Martin & Wiechers, 1954), and intellectualdisabilities (Anderson, Kern, & Cook, 1968; Kilburn, Sanderson, & Melton, 1966; Stacey &Carleton, 1955). It has also been viewed as a culturally and ethnically fair measure of intellectualfunctioning for both children and adults (Anderson et al., 1968; Carlson & Jensen, 1981; Jensen,1974; Kaplan & Sacuzzo, 1997; Valencia, 1984).The CPM first appeared in 1947 as an alternative form of the Raven�s Standard Progressive

Matrices (SPM), and was created specifically for children aged between 5 and 11 years of age (Sat-tler, 1992). It was revised in 1956 and this version continues to be used today in both clinical andresearch settings (Raven, Raven, & Court, 1998; Raven, Court, & Raven, 1990).The CPMcomprises 36 items divided into three sets of 12 (set A, Ab andB).Within each set, items

(which are brightly coloured to attract and maintain children�s attention) are ordered in terms ofincreasing difficulty. Sets also vary in difficulty, with set B containing the most challenging items.The sets have been designed to distinguish between degrees of intellectual maturity by quantifyinga child�s ability to formcomparisons and to reasonby analogy (Raven et al., 1998;Raven et al., 1990).The first standardization of the CPM occurred in 1949 in the UK town of Dumfries. The sam-

ple comprised 627 children between the ages of 5–11 years, representing 25% of the total schoolpopulation (Raven et al., 1998; Raven et al., 1990). The test was sensitive to fluctuations in intel-lectual function, demonstrating good test–retest reliability (r = .80) (Raven et al., 1998; Ravenet al., 1990). In 1982, a follow-up normative study was conducted in Dumfries with a sampleof 598 children. These two standardizations form the core benchmarks by which children�s perfor-mances on the CPM have been evaluated.Apart from the UK normative studies, standardization of the CPM has occurred in: the United

States (US) (Jensen, 1974); France (Bourdier, 1964); Argentina (Angelini, Alves, Cutodio, &Duarte, 1989); Canada (Yeudall, Fromm, Reddon, & Stefanyk, 1986); Italy (Pruneti, 1985); HongKong (Chan & Lynn, 1989); India (Bhogle & Prakash, 1992); Germany (Kurth, 1970; Schmidtke,Schaller, & Becker, 1990).Improved test performance has been evident between 1948 and now for both the CPM and the

SPM, particularly in the UK and the US (Raven, 2000; Raven et al., 1998; Raven et al., 1990;Rodgers, 1999). Compared to the earliest norms the latest Dumfries� testing showed changes inpercentile scores indicating that young people attained higher scores at a younger age (Raven,2000). Similar gains have also been described with those observed for traditional intelligence testssuch as the Wechsler Intelligence Scales (WIS) and the Stanford-Binet Intelligence Scales over thesame period of time. This phenomenon is referred to as the Flynn Effect (Rodgers, 1999) and hasbeen attributed to a variety of factors such as augmented availability of educational opportunities,the introduction of television, and improved nutrition and welfare (Anastasi & Urbina, 1997;

Page 3: A normative and reliability study for the Raven’s Coloured Progressive Matrices for primary school aged children from Victoria, Australia

S.M. Cotton et al. / Personality and Individual Differences 39 (2005) 647–659 649

Raven, 2000; Rodgers, 1999; Thorndike, 1977). Rodgers (1999) argued that the changes seen withmatrix reasoning tasks provide the strongest evidence of the Flynn effect.The Flynn effect can have deleterious effects on the interpretation of test results with outdated

norms possibly resulting in overestimation of an individual�s abilities (Raven, 2000). As the onlyAus-tralian norms for children for the CPM were published over two decades ago, re-norming is consid-ered politic to ascertain whether such changes are happening in a very stable socioeconomic culture.The only Australian normative study for the CPMwas conducted in 1975 in the state of Queens-

land (Reddington & Jackson, 1981). A total of 693 children were sampled from 12 state primaryschools and one pre-school within the Brisbane area. Reddington and Jackson (1981) used sevenmidpoints to distinguish age groups (5.5, 6.5, 7.5, 8.5, 9.5, 10.5, 11.5 years of age). Children wereallocated into the following categories based on the 1971 Australian census: rural (14%) and urban(86%); non-English speaking backgrounds (10%); and indigenous Australians (1%). The majorityof children were from a low socioeconomic background (64%). Percentile rankings, reliability esti-mates, and mean scores for age levels and gender were provided. No gender differences were notedin performance; however, children older than 8 years are reported to have found the test too easy.Despite this problem, the CPM was found to have good reliability as determined by Cronbach�salpha, with estimates ranging from .79 for 7 year olds up to .90 for 11 year olds.Normative studies frequently report reliability statistics such as internal consistency, split-half

reliability and test–retest reliability. Measures employed to assess internal consistency of the CPMinclude Kuder Richardson Formula 20 (K–R 20), Cronbach�s alpha, and item analysis. Estimatesof internal consistency using the K–R 20 and Cronbach�s alpha have averaged around .85 (Cant-well, 1967; Simoes, 1989). Slightly higher ratings of internal consistency have been found usingitem analysis. For example, Green and Kluever (1991) reported a reliability coefficient from itemanalysis of .89. These results indicate that the CPM has good inter-item consistency.Split-half reliability provides another index of content sampling and internal consistency of a

measure (Anastasi & Urbina, 1997). Split-half reliabilities have ranged from a low of .46 for kin-dergarten children (Harris, 1959) to a high of .92 for children aged between 6 and 14 years of age(Wenke &Muller, 1966), with the majority of studies reporting correlations around .80 (Freyburg,1966; Khatena, 1965; Muller, 1970; Simoes, 1989). A major problem with split-half reliability isthat some researchers have adopted the technique of contrasting performance on the first 18 items(i.e., items A1 to Ab6) to the second 18 items (i.e., items Ab7 to B12) of the test. This approach isinappropriate for tests where items increase in difficulty with the first half of the test not beingequivalent to the second half of the test (Anastasi & Urbina, 1997). A much more fruitful ap-proach to calculating split-half reliability for tests, such as the CPM, is to contrast consecutiveodd and even number items of the test.Estimates of internal consistency and split-half reliability, and test–retest for the CPM, have

been reasonable. Slightly lower reliability estimates are found for younger children (Harris,1959; Khatena, 1965). Estimates vary depending on the reliability statistics used; however, theytend to average around .80.This study has two aims. The first aim is to generate CPM norms for a sample of Australian pri-

mary school aged children from the state of Victoria and to contrast these findings to the only avail-able Australian data on the CPM (Reddington & Jackson, 1981). The second aim is to describe theitem characteristics and the reliability estimates of this test for the same sample of Victorian chil-dren. Specifically, internal consistency, split-half reliability and item difficulty, are examined.

Page 4: A normative and reliability study for the Raven’s Coloured Progressive Matrices for primary school aged children from Victoria, Australia

650 S.M. Cotton et al. / Personality and Individual Differences 39 (2005) 647–659

2. Method

2.1. Participants

The sample comprised 618 children (males n = 329, female n = 286, gender not recorded, n = 3)ranging in age from 6 to 11 years. Chi-square analysis indicated that there was equal distributionof males and females across the age levels, v2(5, N = 615) = 2.31, p = .81.The sample was derived from a variety of sources within the state of Victoria, Australia. Three

hundred and sixty-one students were recruited through eight state schools. These schools werearbitrarily selected from four of the nine Department of Education, Employment, and Training(DEET) regions within Victoria. Four schools were situated with the Southern Metropolitanregion (n = 224 children), two schools were from the Northern Metropolitan region (n = 62),one school was located within the Eastern Metropolitan region (n = 63), and one school was fromthe rural Loddon-Campaspe region (n = 31). It is worth noting that Reddington and Jackson(1981) only sampled from state schools, however, in this study, we felt it was important to includechildren from the other two major educational systems—the Catholic and independent school sys-tems. Accordingly, there were 189 children from three Catholic schools, two located within theNorth Eastern Catholic education system region (n = 172) and one from the East Central region(n = 17). An independent private school from the inner city was also included in the sample(n = 49). The diversity of sampling procedures enabled the recruitment of children from a widerange of socio-economic status (SES) backgrounds. Children represented families from low,working, middle and upper class backgrounds. However, specific details regarding SES andethnicity were unavailable.Ethics approval for the study was obtained from the La Trobe University Human Ethics

Committee. Permission to conduct testing in schools was obtained from the Directorate of SchoolEducation (Victoria), the Catholic Education Office Victoria and the Principals or Councils ofManagement of the schools. Active parental or guardian consent as prescribed by institutionalethics committees (McMorris et al., 2004) was obtained prior to the testing and children were freeto withdraw from testing at any time.

2.2. Procedure

Trained clinicians administered the book form of the CPM to each child individually (Ravenet al., 1998; Raven et al., 1990). No time limit was assigned for the task and the standard admin-istration procedure prescribed by Raven et al. (1998) was used.

3. Results

3.1. Data screening

Data were screened using a variety of techniques (e.g. examination of histograms, boxplots, andcalculation of skewness, kurtosis, the Levene�s test, and Mauchly�s Test of Sphericity) to deter-mine the presence of outliers and to assess the assumptions of normality, homogeneity of variance

Page 5: A normative and reliability study for the Raven’s Coloured Progressive Matrices for primary school aged children from Victoria, Australia

Table 1

Means and standard deviations (SD) for each of the age levels and genders for the total CPM score

Age Males Females Total sample Children scoring >34% (n)

n M SD n M SD N M SD

6 25 20.16 5.55 25 16.96 3.77 50 18.78 5.04 0.0 (0)

7 39 22.15 6.39 39 20.51 4.77 78 21.33 5.69 1.28 (1)

8 62 24.81 6.24 62 24.35 6.21 124 24.58 6.21 5.65 (7)

9 61 27.44 5.85 45 26.31 6.05 106 26.96 5.93 9.43 (10)

10a 72 29.49 4.69 60 28.43 3.48 134 29.02 4.18 16.42 (22)

11b 70 28.89 3.92 55 30.20 3.85 126 29.40 3.99 18.25 (23)

Total 329 26.55 6.11 286 25.47 6.30 618 26.06 6.21 10.19 (63)

a Information on the child�s gender was missing for 2 cases.b Gender was missing for one child.

S.M. Cotton et al. / Personality and Individual Differences 39 (2005) 647–659 651

and sphericity (i.e., that the variances of pairs of repeated measurements are equal (Everitt,1996)). No outliers in the data were detected, however for children 9–11 years data were slightlynegatively skewed indicating that a large percentage of the children in this age range found theCPM relatively easy.

3.2. Sample characteristics

The means and standard deviations for each of the six age levels for the total CPM score arepresented in Table 1. To determine the effect of age and developmental changes on mean totalscores, a one-way analysis of variance (ANOVA) was conducted. Significant age group differenceswere seen for the total CPM score, F(5, 612) = 54.48, p < .01, g2 = .31, reflecting a general increasein mean scores with age.Post hoc analysis was conducted using the Student–Newman Keuls test. The children 10 years

and older had significantly higher mean CPM scores than children younger than 10 years, p < .05.Significant differences between performances on the CPM were noted for the 6, 7, 8, and 9 yearolds p < .05. The greatest mean difference was noted between 7 and 8 year olds (Mdiff = �3.25).Gender differences across each of the age levels were examined for the total CPM score using a

series of independent groups t-tests (see Table 1 for means and standard deviations). For each ofthese tests, alpha (a) was set at the conservative level of .01. For the 6 year olds the difference wasjust below a with males having a higher mean total score than the females, t(48) = 2.714, p = .009.It is interesting to note, however, that for the age levels 7 years and onwards, nine children

achieved the maximum score on the test of 36. The percentages of children within each of theage levels who scored 34 or higher on the test are displayed in Table 1. The percentage of childrenscoring 34 or higher, increased with age.

3.3. Percentile ranks

Percentile ranks for each of the age levels are presented in Table 2. Gender was not consideredin calculating percentile ranks as the gender differences were only confined to the 6 year olds.

Page 6: A normative and reliability study for the Raven’s Coloured Progressive Matrices for primary school aged children from Victoria, Australia

Table 2

Percentile scores for each age level

Percentile Age

6 7 8 9 10 11

95 30 31 34 34 35 35

90 25 29 33 33 34 34

85 23 28 32 33 34 34

80 22 27 30 32 33 33

75 22 25 29 32 32 33

70 22 25 28 31 31 32

65 21 24 28 31 31 32

60 20 23 27 30 30 31

55 19 22 26 29 30 30

50 19 21 25 29 30 30

45 17 20 24 28 29 29

40 17 19 23 27 29 28

35 16 18 22 25 28 28

30 15 17 21 24 28 27

25 15 17 20 23 26 27

20 14 16 19 21 25 26

15 13 15 17 20 24 25

10 13 14 16 18 23 23

5 12 13 15 14 21 22

N 50 78 124 106 134 126

652 S.M. Cotton et al. / Personality and Individual Differences 39 (2005) 647–659

3.4. Item characteristics

The item difficulties or the proportion of correct responses were calculated for each of the CPMitems (see Table 3). The proportions of correct responses are referred to as p values. Items, whichhad p values greater than .80 or less than .20, are highlighted.For each age level there were numerous items with extreme p values. For the 6 year olds there

were 18 items with p values either less than .20 or greater than .80. This can be contrasted to the 11year olds where there were 25 items with extreme p values. This is not surprising given the expec-tation of developmental gains in performance on the CPM with age.

3.5. Reliability

The internal consistency of the CPM was assessed using the K–R 20. The K–R 20 provides ameasure of internal consistency for scales with dichotomously coded variables (Anastasi &Urbina, 1997; Portney & Watkins, 2000; Vogt, 1993). Split-half reliability was determined usingthe Spearman–Brown Formula and comparing odd and even items (Anastasi & Urbina, 1997).Reliability estimates were calculated for each age level and for the total sample (see Table 4).Internal consistency estimates ranged from a low .76 (11 year olds) to a high of .88 (for 8 and 9

years). Similar results were obtained for split-half reliability with values ranging from .81 (for 10and 11 year olds) to .90 (9 year olds).

Page 7: A normative and reliability study for the Raven’s Coloured Progressive Matrices for primary school aged children from Victoria, Australia

Table 3

Item difficulties (p values) for the CPM by age group

CPM Item 6 7 8 9 10 11 Total

A1 .94* .96* .99* 1.00* .98* 1.00* .98*

A2 1.00* .96* .99* .97* .99* 1.00* .99*

A3 .98* .96* .98* .97* .96* 1.00* .98*

A4 1.00* .96* .98* .97* .98* 1.00* .98*

A5 .98* .95* .96* .92* .98* .98* .96*

A6 .94* .87* .93* .95* .97* .99* .95*

A7 .58 .69 .69 .79 .87* .84* .76

A8 .62 .74 .69 .73 .81* .90* .76

A9 .44 .69 .78 .87* .91* .94* .81*

A10 .60 .44 .67 .74 .84* .83* .70

A11 .08* .17* .25 .40 .49 .45 .33

A12 .18* .18* .23 .32 .34 .25 .26

Ab1 .94* .95* .96* 1.00* .98* 1.00* .98*

Ab2 .82* .91* .96* .98* .95* .96* .94*

Ab3 .86* .87* .95* .92* .99* .98* .94*

Ab4 .58 .68 .84* .89* .93* .97* .84*

Ab5 .46 .73 .74 .86* .96* .91* .81*

Ab6 .48 .59 .73 .81* .93* .91* .77

Ab7 .48 .51 .74 .88* .87* .97* .77

Ab8 .22 .39 .52 .72 .76 .81* .61

Ab9 .26 .36 .65 .74 .88* .85* .67

Ab10 .46 .46 .62 .71 .78 .77 .66

Ab11 .30 .38 .54 .69 .76 .83* .62

Ab12 .12* .13* .24 .42 .43 .39 .31

B1 .94* .99* .97* .98* 1.00* 1.00* .98*

B2 .64 .78 .88* .89* .93* .98* .87*

B3 .64 .89* .89* .94* .99* 1.00* .92*

B4 .72 .76 .83* .90* .97* .99* .88*

B5 .38 .59 .66 .79 .88* .92* .73

B6 .50 .50 .67 .66 .84* .76 .68

B7 .24 .37 .36 .46 .51 .40 .40

B8 .06* .17* .39 .43 .55 .59 .40

B9 .12* .22 .45 .49 .72 .65 .48

B10 .16* .29 .49 .55 .74 .75 .53

B11 .06* .14* .32 .48 .57 .58 .39

B12 .02* .11* .15* .15* .31 .23 .17*

S.M. Cotton et al. / Personality and Individual Differences 39 (2005) 647–659 653

Reliability estimates were recalculated to determine the effects of items with extreme re-sponses (i.e., p less than .20 or greater than .80) on reliability. The resulting estimates wereslightly lower than reliability estimates for the total pool of items. For two age levels (i.e., 10and 11 years), there was a considerable difference between the reliability estimates for the totalitems and for the reduced set of items. This was considered the consequence of the two age lev-els only having 12 and 11 items remaining respectively, after items with extreme p values wereremoved.

Page 8: A normative and reliability study for the Raven’s Coloured Progressive Matrices for primary school aged children from Victoria, Australia

Table 4

Internal consistency and split-half reliability estimates for the CPM by age group

Age group N K–R formula 20 Spearman–Brown formula

Total items Items removed Total items Items removed

6 50 .81 .77 .89 .85

7 78 .85 .82 .83 .84

8 121 .88 .87 .88 .85

9 103 .88 .86 .90 .87

10 106 .81 .75 .81 .78

11 92 .76 .69 .81 .71

Total 550 .89 .87 .90 .88

654 S.M. Cotton et al. / Personality and Individual Differences 39 (2005) 647–659

4. Discussion

The current study provides up to date data for the CPM for a sample of 618 childrenfrom the state of Victoria. These data can be used as an estimate of Australian children generalintellectual ability. The only earlier Australian norms for primary-school aged children for theCPM were collected three decades ago in the state of Queensland (Reddington & Jackson,1981). Using outdated norms as benchmarks for judging children�s performances on theCPM may result in spurious conclusions about children�s abilities (Raven, 2000; Raven et al.,1998).

4.1. Differences between 1975 and current CPM norms for Australian children

Reddington and Jackson�s (1981) data provides the first best estimate of general intellectualability, as measured by the CPM, in a sample of Australian children. To contrast key findingsof Reddington and Jackson�s (1981) study and the current research, information on the medians(50th percentile), 25th and 75th percentiles, are presented for each of the age levels in Table 5. Ininterpreting these results it is important to consider similarities and dissimilarities between the twostudies. Reddington and Jackson�s (1981) study had a similar sized sample and age and genderdistribution to the current study. The studies however, are dissimilar with respect to sampling pro-cedures and may deviate in terms of demographic attributes (state characteristics, social economicstatus, non-English speaking backgrounds and indigenous Australians) and the characteristics ofthe education departments. Reddington and Jackson (1981) based their recruitment strategies oncensus data and recruited children through state schools, while they ignored the other majorschool systems. In the current study, extensive sampling of children across State educational sys-tems and across regions was conducted to ensure sample diversity although, we did not sampleaccording to census data. Additionally, the social and economic cultures of the two Australianstates at the two time points may be very different.Although these two Australian normative studies were conducted 30 years apart and were dis-

parate with respect to sampling techniques, the medians, 25th and 75th percentiles of the twostudies are remarkably similar. For children in the 6, 7, and 9 year old groups the medians were

Page 9: A normative and reliability study for the Raven’s Coloured Progressive Matrices for primary school aged children from Victoria, Australia

Table 5

A comparison of selected findings from the current study to results obtained by (Reddington and Jackson, 1981) and

the 1982 Dumfries normative data

Age n Mdn 25th percentile 75th percentile

Australia UK Australia UK Australia UK Australia UK

2003a 1975b 1982c 2003 1975 1982 2003 1975 1982 2003 1975 1982

6 50 100 42 19 18 17 15 15 14 22 21 20

7 78 100 55 21 21 20 17 17 17 25 24 23

8 124 100 48 25 26 24 20 22 20 29 29 27

9 106 100 37 29 28 28 23 23 24 32 32 31

10 134 100 49 30 31 31 26 27 26 32 33 33

11 126 100 55 30 32 32 27 30 30 33 33 34

a Normative data from this study.b Normative data from Reddington and Jackson�s (1981) study.c Normative data from Raven et al. (1998).

S.M. Cotton et al. / Personality and Individual Differences 39 (2005) 647–659 655

exactly the same for the two studies. For the remaining age groups there were only, at the most,three point differences in the median scores obtained by the two studies. If one assumes that thesestudies are comparable, then it would be argued that there were minimal, if any changes, betweenthe 1975 standardization and the current study, and that the performance of Australian childrenon the CPM have remained exceptionally stable over time. Caution, however, is warranted, giventhe dissimilarities in sampling between the two studiesThe possibility of stable test performance on the CPM in Australian children not only contra-

dicts the proposal of Raven et al. (1998) of changes in population intellectual function over timebut fails to support the generalized notion of the Flynn Effect. Raven et al.�s (1998) premise wasbased on differences found between the 1948 and 1982 standardizations of the CPM in Dumfries.Raven (2000) viewed the changes in performance on matrix reasoning tasks as due to improve-ments in ‘‘nutrition, welfare, and hygiene’’ (p. 32). Others have argued that changes in intelligencetest performance are due to earlier cognitive maturation resulting from enhanced early education(Rodgers, 1999; Thorndike, 1977). Rodgers (1999) argued that both period and cohort effects con-tribute to gains in intelligence. Period effects are the consequences of social, educational and eco-nomic influences at a particular point in time that may influence a child�s ability. Alternatively,cohort effects may also incorporate social, educational and economic factors; however, they arespecific to subgroups within the population such as particular schools.As there is evidence to support an absence of gains in performance in Australian children, one

may argue this is due to a lack of period effects. The changes in social, educational and economicsettings in Australia since 1975 when Reddington and Jackson (1981) collected their data, and2005, may be negligible. Conversely, the differences between post World War II Dumfries in1948 and then in 1982 would be expected to be much greater, particularly in terms of social, nutri-tional, and economic environments.Also noteworthy, is that the norms from both Australian studies are almost identical to the

norms provided in the 1982 standardization in Dumfries (see Table 5). Indeed, the greatest gainsin intelligence and performance on matrix reasoning tasks have been generally confined to pre-1980s (Raven, 2000).

Page 10: A normative and reliability study for the Raven’s Coloured Progressive Matrices for primary school aged children from Victoria, Australia

656 S.M. Cotton et al. / Personality and Individual Differences 39 (2005) 647–659

4.2. Reliability of the CPM

The estimates of internal consistency and split-half reliability for this study were remarkablysimilar to those of previous studies (Cantwell, 1967; Freyburg, 1966; Khatena, 1965; Muller,1970; Reddington & Jackson, 1981; Simoes, 1989). For the total sample of children internal con-sistency (r = .89) and split-half reliability (r = .91) were high. Additionally, there was minimal var-iation across the age levels with respect to the reliability estimates, contradicting previous reportsof the CPM being a less reliable measure of non-verbal ability in younger children (Harris, 1959;Khatena, 1965).Item analysis revealed some interesting information on the psychometric properties of the

CPM. The proportion of correct responses for some items was either too high (p > .80) or toolow (p < .20). For the overall sample, 16 items had extreme p values. Items with extreme p valuesare superfluous, providing less differential information about individual differences (Anastasi &Urbina, 1997). In ideal circumstances, there should be a spread of items with an average difficultyof .5.The value of the CPM items with extreme p values was lessened when the reliability coefficients

were recalculated discarding these items. Removal of these items resulted in minimal changes tothe overall reliability estimates. However, if one considers changes in reliability estimates for theage levels, there are sizeable drops in the reliability estimates for 10 and 11 year olds. The drop inreliability was more noticeable in these age levels for the split-half reliability than for the K–R 20,a finding consistent with Carlson and Jensen�s (1981) study. These drops in reliability are the re-sult of the number of extreme responses for 10 and 11 year olds, being 24 and 25 respectively.There is a trade-off between eradicating items that are either too easy or difficult for childrenand shortening the length of the test to the point that reliability is adversely affected (Carlson& Jensen, 1981). For these children, researchers and clinicians may need to adopt the more ad-vanced matrices, the SPM in order to have a more reliable assessment of fluid intelligence.A problem with reliability and item analyses for the CPM is that the scoring scheme assumes

that the test measures one underlying construct of non-verbal fluid intelligence. This is not neces-sarily the case. Evidence from factorial analytical studies has indicated that the CPM may be rep-resented by a number of underlying factors or dimensions. Generally these studies have reportedthat three factors delineate performance on the CPM: continuous and discrete pattern comple-tion; pattern completion through closure; and concrete and abstract reasoning (Carlson & Jensen,1980; Wiedl & Carlson, 1976). This suggests that the items themselves are qualitatively differentRaven et al. (1990).Supporting the factor analytical studies are brain imaging investigations. Prabhakaran et al.

(1997) reported that brain activation as measured by functional magnetic resonance imaging(fMRI) varied as a function of the type of item of the adult version of the Raven�s, the SPM. Itemsof the SPM that involved analytical reasoning resulted in activation of bilateral frontal, and lefttemporal, parietal and occipital regions, or areas of the cortex known to be associated with verbalworking memory and executive functions. In contrast, items that required figural or visuo-spatialreasoning activated regions associated with spatial working memory such as right frontal andbilateral parietal regions.Together findings from factor analytical and imaging studies support the notion that different

abilities underpin the problem solving involved with any of Raven�s matrices tasks. A single total

Page 11: A normative and reliability study for the Raven’s Coloured Progressive Matrices for primary school aged children from Victoria, Australia

S.M. Cotton et al. / Personality and Individual Differences 39 (2005) 647–659 657

score is not going to differentiate between items that depend on analytical versus figural reasoning.The scoring scheme that is used with the Raven�s matrices may need to reconsider that the itemsare not measuring a one-dimensional trait, rather they assess multiple abilities. The scores derivedfrom the test need to differentiate not only between the type of items, but also the skills necessaryfor successful task completion.

5. Conclusions

Several substantial conclusions can be drawn from this study. It may be speculated that therehave been minimal changes in Australian children�s performance on the CPM since 1975, conflict-ing with Raven�s (2000) view that performance on the Raven�s Progressive Matrices tests haschanged over time. Despite the negligible differences in the two normative studies, caution is stillwarranted with respect to interpretation of research findings regarding Flynn effects.The current study provides an up-to-date snap shot of CPM performance in a sample of pri-

mary school aged children from Victoria, Australia. The CPM does provide a reliable means ofmeasuring non-verbal intellectual development for these children. The CPM also demonstratesgood reliability consistent with previous findings. There is need, however, to further scrutinizethe psychometric properties of the CPM especially considering item difficulty and the appropriate-ness of the items for various age levels.

Acknowledgements

We acknowledge Australian Research Council (ARC) funding for Dr. Patricia Kiely�s positionunder project grant #A0000937. We would like to thank the principals, teachers, and students atthe schools where we conducted this research.

References

Anastasi, A., & Urbina, S. (1997). Psychological testing. New Jersey: Prentice-Hall.

Anderson, H. E., Kern, F. E., & Cook, C. (1968). Sex, brain damage, and race effects in the Progressive Matrices with

retarded population. Journal of Social Psychology, 76, 207–211.

Angelini, A. L., Alves, I. C. B., Cutodio, E. M., & Duarte, W. F. (1989). The Sao Paulo norms for the Raven�s ColouredProgressive Matrices. Psychological Test Bulletin, 2(2), 46–49.

Bhogle, S., & Prakash, I. (1992). Performance of Indian children on the Coloured Progressive Matrices. Psychological

Studies, 37(2–3), 178–181.

Bourdier, G. (1964). Utilisation et nouvel etalonnage du PM 47. Bulletin de Psychologie, 18(18), 29–34.

Cantwell, Z. M. (1967). The performance of American pupils on the Coloured Progressive Matrices. British Journal of

Educational Psychology, 37, 389–390.

Carlson, J. S., & Jensen, C. M. (1980). The factorial structure of the Raven Coloured Progressive Matrices Test:

A Reanalysis. Educational and Psychological Measurement, 40, 1111–1116.

Carlson, J. S., & Jensen, C. M. (1981). Reliability of the Raven Colored Progressive Matrices Test: Age and ethnic

group comparisons. Journal of Consulting and Clinical Psychology, 49(3), 320–322.

Carver, R. P. (1990). Intelligence and reading ability in grades 2–12. Intelligence, 1, 449–455.

Page 12: A normative and reliability study for the Raven’s Coloured Progressive Matrices for primary school aged children from Victoria, Australia

658 S.M. Cotton et al. / Personality and Individual Differences 39 (2005) 647–659

Chan, J., & Lynn, R. (1989). The intelligence of six-year olds in Hong Kong. Journal of Biological Science, 21, 461–464.

Everitt, B. S. (1996). Making sense of statistics in psychology. A Second-level Course. Oxford: Oxford University Press.

Freyburg, P. S. (1966). The efficacy of the Coloured Progressive Matrices as a group test with young children. British

Journal of Educational Psychology, 36, 171–177.

Gray, J. R., Chabris, C. F., & Braver, T. S. (2003). Neural mechanisms of general fluid intelligence. Nature

Neuroscience, 6(3), 316–322.

Green, K. E., & Kluever, R. C. (1991). Structural properties of Raven�s Coloured Progressive Matrices for a sample ofgifted children. Perceptual and Motor Skills, 72, 59–64.

Harris, D. B. (1959). A note on some ability correlates of the Raven Progressive Matrices (1947) in the kindergarten.

Journal of Educational Psychology, 50, 228–229.

Jensen, A. R. (1974). How biased are culture-loaded tests? Genetic Psychology Monographs, 90, 185–244.

Kaplan, R. M., & Sacuzzo, R. M. (1997). Psychological testing: Principles, applications and issues (4th ed.). Pacific

Grove, CA: Brooks/Cole.

Khatena, J. A. (1965). A study on the reliability of the Raven�s Coloured Progressive Matrices 1947. Education Journal,3, 51–53.

Kilburn, K. L., Sanderson, R. E., & Melton, K. (1966). Relation of the Raven Coloured Progressive Matrices to two

measures of verbal ability in a sample of mildly retarded hospital patients. Psychological Reports, 19, 731–734.

Kurth, E. (1970). Erholung der Leistungsnormen bei den Farbigen Progressiven Matrizen. Zeitschrift fur Psychologie,

177, 85–90.

Martin, A. W., & Wiechers, J. E. (1954). Raven�s colored progressive matrices and the Wechsler intelligence scale forchildren. Journal of Consulting Psychology, 18(2), 143–144.

McMorris, B. J., Clements, J., Evans-Whipp, T., Gangnes, D., Bond, L., Toumbourou, J. W., et al. (2004). A

comparison of methods to obtain active parental consent for an international study survey. Evaluation Review, 28(1),

64–83.

Muller, R. (1970). Eline kritische empirische Untersuchung des Draw-a-Man Test und der Coloured Progressive

Matrices. Diagnostica, 16, 138–147.

Portney, L. G., & Watkins, M. P. (2000). Foundations of Clinical Research. Applications to Practice (2nd ed.). New

Jersey: Prentice Hall Health.

Prabhakaran, V., Smith, J., Desmond, J. E., Glover, G. H., & Gabrieli, J. D. E. (1997). Neural substrates of fluid

reasoning: An fMRI study of neocortical activation during performance of the Raven�s Progressive Matrices Test.Cognitive Psychology, 33, 43–63.

Pruneti, C. A. (1985). Dati normativi de Test P.M. 47 Coloured su un campione di bambini italiani. Bollettino di

Psicologia Applicata, 176, 27–35.

Raven, J. (2000). The Raven�s Progressive Matrices: Change and stability over culture and time. Cognitive Psychology,41, 1–48.

Raven, J., Raven, J. C., & Court, J. H. (1998). Section 2: Coloured Progressive Matrices (1998 Edition). Introducing the

parallel version of the test. In Manual for the Raven’s Progressive Matrices and Vocabulary Scales. Great Britain:

Oxford Psychologist Press.

Raven, J. C., Court, J. H., & Raven, J. (1990). Section 2: Coloured Progressive Matrices (1990 Edition, with US

Norms). InManual for the Raven’s Progressive Matrices and Vocabulary Scales. Oxford: Oxford Psychologist Press.

Reddington, M. J., & Jackson, K. (1981). The Raven�s Coloured Progressive Matrices (1956): A Queensland

standardization. Australian Council for Educational Research Bulletin for Psychologists, 30, 20–26.

Rodgers, J. L. (1999). A critque of the Flynn effect: Massive IQ gains, methodological artifacts, or both. Intelligence,

26(4), 337–356.

Sattler, J. M. (1992). Assessment of children. San Diego: Jerome M Sattler, Publisher, Inc.

Schmidtke, A., Schaller, S., & Becker, P. (1990). Manual: Raven–Matrizen-Test: Coloured Progressive Matrices.

Weinheim: Beltz Test.

Simoes, M. M. R. (1989). Um Estudo Exploratorio comp o Teste das Matrices Progressivas de Raven Para Criancas

(CPM/PM47). Paper presented at the ‘‘Psychology and Psychologists Today Congress’’, Lisbon.

Stacey, C. L., & Carleton, F. O. (1955). The relationship between Raven�s Coloured Progressive Matrices and two testsof general intelligence. Journal of Clinical Psychology, 11, 84–85.

Page 13: A normative and reliability study for the Raven’s Coloured Progressive Matrices for primary school aged children from Victoria, Australia

S.M. Cotton et al. / Personality and Individual Differences 39 (2005) 647–659 659

Stanovich, K. E., Cunningham, E., & Freeman, D. J. (1984). Intelligence, cognitive skills and reading progress. Reading

Research Quarterly, 19, 3.

Thorndike, R. L. (1977). Causation of Binet IQ decrements. Journal of Educational Measurement, 14, 197–202.

Valencia, R. R. (1984). Reliability of the Raven Coloured Progressive Matrices for Anglo and for Mexican-American

children. Psychology in the Schools, 21(1), 49–52.

Vogt, W. P. (1993). Dictionary of statistics and methodology. A nontechnical guide for the social sciences. Newbury Park:

Sage Publications.

Wiedl, K. H., & Carlson, J. (1976). The factorial structure of the Raven Coloured Progressive Matrices Test.

Educational and Psychological Measurement, 36, 409–413.

Wenke, W., & Muller, U. (1966). Moglichkeiten und Grenzen des Einsatze einzelner diagnosticher Kurzverfahren bei

der Schulerauslese. Zeitschrift fur Psychologie, 172, 82–116.

Yeudall, L. T., Fromm, D., Reddon, J. R., & Stefanyk, W. O. (1986). Normative data stratified by age and sex for 12

neuropsychological tests. Journal of Clinical Psychology, 42(6), 918–946.