50
1 Lecture 4 CONSTRUCT VALIDITY

1 Lecture 4 CONSTRUCT VALIDITY. 2 Validity A test is said to be VALID if it measures what it is supposed to measure

Embed Size (px)

Citation preview

Page 1: 1 Lecture 4 CONSTRUCT VALIDITY. 2 Validity A test is said to be VALID if it measures what it is supposed to measure

1

Lecture 4

CONSTRUCT VALIDITY

Page 2: 1 Lecture 4 CONSTRUCT VALIDITY. 2 Validity A test is said to be VALID if it measures what it is supposed to measure

2

Validity

• A test is said to be VALID if it measures what it is supposed to measure.

Page 3: 1 Lecture 4 CONSTRUCT VALIDITY. 2 Validity A test is said to be VALID if it measures what it is supposed to measure

3

Summary …

• There have been many different interpretations of validity.

• There are FOUR main approaches:

1. FACE VALIDITY2. CONTENT VALIDITY3. PREDICTIVE VALIDITY4. CONSTRUCT VALIDITY.

Page 4: 1 Lecture 4 CONSTRUCT VALIDITY. 2 Validity A test is said to be VALID if it measures what it is supposed to measure

4

Tests in action

• Psychometric tests are now widely used in job selection.

• There, the emphasis is upon PREDICTIVE validity.

• I have 100 applications for three places on a course in electronics.

• Which applicant shall I choose? I know very little about any of the applicants. I have an hour or so to make a decision.

Page 5: 1 Lecture 4 CONSTRUCT VALIDITY. 2 Validity A test is said to be VALID if it measures what it is supposed to measure

5

A valid test

• Fortunately, I have a test which enables me to predict success on the course.

• The test is highly reliable; moreover, there is a large body of data showing that those who do best on the test tend to perform best on the electronic course itself.

• My test is not only RELIABLE but also VALID.

Page 6: 1 Lecture 4 CONSTRUCT VALIDITY. 2 Validity A test is said to be VALID if it measures what it is supposed to measure

6

Theory

• What exactly is the test measuring? Perhaps it doesn’t really matter.

• It is simply an instrument I use to help select the right candidate.

• There is practical justification for saying, ‘This test measures whatever ability (or abilities) the course requires’!

Page 7: 1 Lecture 4 CONSTRUCT VALIDITY. 2 Validity A test is said to be VALID if it measures what it is supposed to measure

7

Practice or theory?

• The usefulness of a test, that is, its PREDICTIVE VALIDITY, is improved by continuously modifying its items so that it meets STATISTICAL criteria.

• But the items that perform best may not seem theoretically to be the best measures of what the test was originally supposed to be measuring.

• Thus there can be a TENSION between considerations of psychometric PERFORMANCE and the building of sound THEORY.

Page 8: 1 Lecture 4 CONSTRUCT VALIDITY. 2 Validity A test is said to be VALID if it measures what it is supposed to measure

8

History

• The mental testing movement received an enormous boost from the two world wars.

• New recruits had to be assigned at short notice to activities they could perform.

• Not everyone can be a navigator in a bomber crew, for example.

• In such circumstances, theoretical considerations about what exactly the tests were measuring seemed largely irrelevant, as long as they helped to assign the right person to the right job.

Page 9: 1 Lecture 4 CONSTRUCT VALIDITY. 2 Validity A test is said to be VALID if it measures what it is supposed to measure

9

Methodology

• Cognitive psychology makes greatest use of EXPERIMENTAL METHOD, because that approach enables the researcher to identify the key variables.

• Psychometrics is an essentially CORRELATIONAL enterprise.

• It is very difficult to identify crucial variables from correlational data.

• It is therefore difficult to map the results of psychometric research on to those of cognitive psychology.

Page 10: 1 Lecture 4 CONSTRUCT VALIDITY. 2 Validity A test is said to be VALID if it measures what it is supposed to measure

10

4. Construct validity

• The extent to which a test can be shown to measure a hypothetical construct is known as its CONSTRUCT VALIDITY.

• Here the emphasis switches from PREDICTION to THEORY.

• Of the various kinds of validity, construct validity is by far the most difficult to demonstrate.

Page 11: 1 Lecture 4 CONSTRUCT VALIDITY. 2 Validity A test is said to be VALID if it measures what it is supposed to measure

11

Demonstration of construct validity

1. Your test must CORRELATE substantially with SOME other variables (CONVERGENCE).

2. But your Your measure must also show DISSOCIATION from other variables (DIVERGENCE).

3. Where expected, your measure should also show AGE DIFFERENTIATION. Cognitive ability, for example, increases with age and any supposed test of cognitive ability should reflect this developmental trend.

Page 12: 1 Lecture 4 CONSTRUCT VALIDITY. 2 Validity A test is said to be VALID if it measures what it is supposed to measure

12

Field Dependence-Independence

• Witkin held that people vary on a hypothetical psychological dimension he called FIELD-DEPENDENCE-INDEPENDENCE.

• The field-independent person is supposed to be able to analyse the total ‘field’ of experience into its component parts and manipulate the parts independently of the overall organisation in order to solve a variety of problems.

• This analytic capacity is claimed to be wide-ranging and to pervade most aspects of a person’s mental life.

Page 13: 1 Lecture 4 CONSTRUCT VALIDITY. 2 Validity A test is said to be VALID if it measures what it is supposed to measure

13

Witkin’s tests

I described three of Witkin’s tests:

1. The Rod-and-frame Test (RFT);

2. The Embedded Figures Test (EFT);

3. The Body Adjustment Test (BAT).

Page 14: 1 Lecture 4 CONSTRUCT VALIDITY. 2 Validity A test is said to be VALID if it measures what it is supposed to measure

14

Convergence?

• The person who can adjust the rod to the true vertical (in the RFT) should be able to see the embedded figure (in the EFT).

• Such people should also be able to adjust their chairs to the upright position (BAT), despite the tilt of the walls of the artificial ‘room’.

Page 15: 1 Lecture 4 CONSTRUCT VALIDITY. 2 Validity A test is said to be VALID if it measures what it is supposed to measure

15

Convergence …

• Since they are supposed to be measuring the same hypothetical construct (field-dependence-independence), Witkin’s tests should certainly correlate highly with one another.

• Since they are cognitive tests, however, they could also be expected to correlate positively with at least SOME of the abilities that are required for performance on an intelligence test.

Page 16: 1 Lecture 4 CONSTRUCT VALIDITY. 2 Validity A test is said to be VALID if it measures what it is supposed to measure

16

Witkin’s evidence

• Witkin (and many others) have shown that there are indeed substantial positive CORRELATIONS among the EFT, BAT and RFT tests.

• The person who adjusts the rod to the true vertical can also make the chair upright and quickly spot the embedded figures. The person who cannot spot the embedded figure insists that the rod is vertical when it is actually aligned with the long axis of the frame and claims that a chair is truly upright when it is actually aligned with the tilted room.

Page 17: 1 Lecture 4 CONSTRUCT VALIDITY. 2 Validity A test is said to be VALID if it measures what it is supposed to measure

17

Convergent validation

• Each of the three measures correlates significantly and substantially with the other two.

• The correlations in the table below are typical of those found in many studies by many different teams of researchers.

• The criterion of CONVERGENCE is met by Witkin’s tests.

Page 18: 1 Lecture 4 CONSTRUCT VALIDITY. 2 Validity A test is said to be VALID if it measures what it is supposed to measure

18

Just intelligence?

• Witkin’s measures correlate positively with the Full Scale WAIS IQ.

• For example, one study (Witkin, 1965) showed that EFT and WAIS IQ correlated significantly: r(72) = .36; p < .01 .

• Is Witkin’s hypothetical construct simply INTELLIGENCE? Is there really a separate dimension of Field-Dependence-Independence?

• To make his case, Witkin must also show theoretically meaningful DISSOCIATION, or DIVERGENCE, of his measures from other cognitive activities.

Page 19: 1 Lecture 4 CONSTRUCT VALIDITY. 2 Validity A test is said to be VALID if it measures what it is supposed to measure

19

The WAIS items

1. Information.2. Picture Completion.3. Digit Span.4. Picture arrangement.5. Vocabulary.6. Block Design.7. Arithmetic.8. Object Assembly.9. Comprehension.10.Digit Symbol.11.Similarities.

Page 20: 1 Lecture 4 CONSTRUCT VALIDITY. 2 Validity A test is said to be VALID if it measures what it is supposed to measure

20

The ‘analytical’ subgroup

• Consider: Block DesignPicture Arrangement.Object Assembly.

• According to Witkin, these three tests all require the participant to analyse the field into its component parts and reassemble them to solve the problem. This is not true of other subtests, such as Vocabulary, Comprehension or Digit Span.

• Witkin therefore predicted that the EFT should correlate highly with the tests in the ‘analytical’ subgroup, but not significantly with the other WAIS items.

Page 21: 1 Lecture 4 CONSTRUCT VALIDITY. 2 Validity A test is said to be VALID if it measures what it is supposed to measure

21

Divergence

• The EFT does indeed correlate highly with the Kohs blocks, from the analytical subgroup.

• But the correlation with non-analytic items such as Vocabulary is insubstantial and insignificant.

• Witkin has demonstrated the DIVERGENCE he needs to demonstrate the CONSTRUCT VALIDITY of his tests as measuring a distinct dimension of cognition.

Page 22: 1 Lecture 4 CONSTRUCT VALIDITY. 2 Validity A test is said to be VALID if it measures what it is supposed to measure

22

Construct validity of Witkin’s tests

• Witkin has made a cogent case for the construct validity of his tests of field-dependence-independence.

• There is CONVERGENCE: the tests correlate substantially among themselves; and they also correlate significantly with IQ, as they should do.

• But there is also DIVERGENCE: the tests correlate strongly with the analytical subgroup of WAIS tests; but they do not correlate with ‘non-analytic’ items such as vocabulary and arithmetic.

Page 23: 1 Lecture 4 CONSTRUCT VALIDITY. 2 Validity A test is said to be VALID if it measures what it is supposed to measure

23

Nonverbal working memory

In the first lecture, I described two measures of non-verbal working memory:

1. The Corsi Blocks Test;

2. The Visual Patterns Test.

Page 24: 1 Lecture 4 CONSTRUCT VALIDITY. 2 Validity A test is said to be VALID if it measures what it is supposed to measure

24

The Corsi and Visual spans

• The Corsi Span is the length of the longest sequence of tapped blocks that the participant can correctly reproduce.

• The Visual Span is the size of the largest pattern that the participant can correctly reproduce.

Page 25: 1 Lecture 4 CONSTRUCT VALIDITY. 2 Validity A test is said to be VALID if it measures what it is supposed to measure

25

The Visual Patterns Test: Does it have construct validity?

• It is claimed that the Visual Patterns Test measures visual storage in purer form than the Corsi Blocks Test, which measures visual plus spatial working memory.

• But could both tests be measuring the same functions?

Page 26: 1 Lecture 4 CONSTRUCT VALIDITY. 2 Validity A test is said to be VALID if it measures what it is supposed to measure

26

Convergence

• The VP and the CB should correlate positively and significantly. But, since the CB taps more than visual memory, the correlation should be far from perfect.

• This is, in fact, the case. There is a significant correlation between the VP and CB tests: r(74) = .27; p < .01. This value of r is similar to the correlation between Field-Dependence-Independence and IQ: although significant, it is suitably small.

• This correlation accounts for less than 10% of the variance (CD = r2 = .09).

Page 27: 1 Lecture 4 CONSTRUCT VALIDITY. 2 Validity A test is said to be VALID if it measures what it is supposed to measure

27

Divergence

• The claim is that the Corsi and Patterns tests are not measuring the same functions.

• If we can manipulate a theoretically relevant variable and demonstrate differential effects upon the Corsi and Pattern spans, we shall have produced evidence to confirm this claim.

Page 28: 1 Lecture 4 CONSTRUCT VALIDITY. 2 Validity A test is said to be VALID if it measures what it is supposed to measure

28

An experiment

Della Sala, S., Gray, C., Baddeley, A., Allamano, N., & Wilson, L. (1999) Pattern span: A tool for unwelding visuo-spatial memory. Neuropsychologia, 37, 1189-1199.

Page 29: 1 Lecture 4 CONSTRUCT VALIDITY. 2 Validity A test is said to be VALID if it measures what it is supposed to measure

29

The experiment

• First, we obtained the Corsi and Visual Patterns spans.

• Next, the participants performed an interference task.

• Finally, the Corsi and Visual Patterns spans were redetermined.

• As expected, the new spans were shorter, as a result of the interference.

Page 30: 1 Lecture 4 CONSTRUCT VALIDITY. 2 Validity A test is said to be VALID if it measures what it is supposed to measure

30

Interference tasks

• There were two kinds of interference tasks:– 1. Visual;– 2. Spatial.

• We should find that Visual interference has a greater effect upon the Visual Patterns span; but Spatial interference should have more effect upon the Corsi span.

Page 31: 1 Lecture 4 CONSTRUCT VALIDITY. 2 Validity A test is said to be VALID if it measures what it is supposed to measure

31

A graph showing the differential effects of interference

Visual Patterns Corsi Blocks

Page 32: 1 Lecture 4 CONSTRUCT VALIDITY. 2 Validity A test is said to be VALID if it measures what it is supposed to measure

32

The dissociation pattern

• Visual interference has much greater shortening effect upon the Pattern Span than upon the Corsi Span.

• Spatial interference has a much greater shortening effect upon the Corsi Span than it does upon the Pattern Span.

• Such DIVERGENCE supports the claim that the Patterns and Corsi tests measure different kinds of nonverbal working memory.

Patterns Corsi

Page 33: 1 Lecture 4 CONSTRUCT VALIDITY. 2 Validity A test is said to be VALID if it measures what it is supposed to measure

33

Age differentiation

• If a test is supposed to measure a cognitive function, performance on the test should show a typical age trajectory.

• The Visual Patterns test does indeed show the expected decline from early adulthood: r(345) = -.55; p <.01.

• The Corsi Blocks test also shows a similarly substantial negative correlation with age.

Page 34: 1 Lecture 4 CONSTRUCT VALIDITY. 2 Validity A test is said to be VALID if it measures what it is supposed to measure

34

The Colours Test

• Psychological tests are widely used in industry. • The test I am about to described is used in the

oil industry to help to assign an employee to the role in a team for which he is best suited.

• The attributes supposedly measured by the test are letter and colour-coded and the management take note of colour codes when assigning employees to team projects.

Page 35: 1 Lecture 4 CONSTRUCT VALIDITY. 2 Validity A test is said to be VALID if it measures what it is supposed to measure

35

Four team functions

• A (RED). Directing and leading.

• B(YELLOW). Sociability.

• C(BLUE). Troubleshooting.

• D(GREEN). Thinking and planning.

Page 36: 1 Lecture 4 CONSTRUCT VALIDITY. 2 Validity A test is said to be VALID if it measures what it is supposed to measure

36

The Test Instrument

• The response sheet has 28 boxes to be completed

• In each box, circle the response that you are – Most like– Least like

• (Your ‘instinctive response’ is probably the most accurate. First thoughts are best, here. So try to answer the questions quickly.)

Page 37: 1 Lecture 4 CONSTRUCT VALIDITY. 2 Validity A test is said to be VALID if it measures what it is supposed to measure

37

7 9 -27 6 17 4 36 7 -11 1 0

Page 38: 1 Lecture 4 CONSTRUCT VALIDITY. 2 Validity A test is said to be VALID if it measures what it is supposed to measure

38

Analysis

• Transfer your “Difference” scores to this sheet.

• Draw a line through the scores.

• The highest values on the page are your “Dominant” colours.

• This person’s dominant colours are A and D.

• This person is a leader and a planner.

Page 39: 1 Lecture 4 CONSTRUCT VALIDITY. 2 Validity A test is said to be VALID if it measures what it is supposed to measure

39

Interpretation

Page 40: 1 Lecture 4 CONSTRUCT VALIDITY. 2 Validity A test is said to be VALID if it measures what it is supposed to measure

40

A reliability study

• I have carried out an informal investigation of the test-retest reliability of the colours test.

• I gave the Colours Test twice to this class, leaving a week between each session.

• I obtained sixty-one pairs of responses.

Page 41: 1 Lecture 4 CONSTRUCT VALIDITY. 2 Validity A test is said to be VALID if it measures what it is supposed to measure

41

Preliminary analysis

• The profiles are based on the four difference scores.

• Here is the test-retest reliability for each of these four measures.

Page 42: 1 Lecture 4 CONSTRUCT VALIDITY. 2 Validity A test is said to be VALID if it measures what it is supposed to measure

42

Directing (A; Red)

• The scatterplot is a narrow ellipse.

• There should be a very high correlation.

• Indeed there is: r (61) = .90; p <.01.

• This level of reliability is very acceptable.

Page 43: 1 Lecture 4 CONSTRUCT VALIDITY. 2 Validity A test is said to be VALID if it measures what it is supposed to measure

43

Thinking (D; Green)

• The scatterplot is a narrow ellipse.

• The correlation should be high.

• It is high: r(61) = .85; p < .01.

• This level of reliability is also very acceptable.

Page 44: 1 Lecture 4 CONSTRUCT VALIDITY. 2 Validity A test is said to be VALID if it measures what it is supposed to measure

44

Relating (C; Blue)

• The scatterplot is a narrow ellipse.

• The correlation should be high.

• It is: r(61) = .83; p < .01.

• This level of reliability is very acceptable.

Page 45: 1 Lecture 4 CONSTRUCT VALIDITY. 2 Validity A test is said to be VALID if it measures what it is supposed to measure

45

Sociability (B; Yellow)

• This time the scatterplot is messier: there are some outliers.

• We cannot expect the value of r to be so high.

• Indeed, it is not: r(61) = .76; p < .01.

• This level of reliability is just acceptable.

Page 46: 1 Lecture 4 CONSTRUCT VALIDITY. 2 Validity A test is said to be VALID if it measures what it is supposed to measure

46

Appraisal

• The Colours Test would appear to be reliable, at least when used with Level 2 students at this university.

• What is needed is another (larger) study with oil workers.

• THE NORMS FOR A TEST SHOULD ALWAYS BE GATHERED FROM THE POPULATION IN WHICH THE TEST IS TO BE USED.

Page 47: 1 Lecture 4 CONSTRUCT VALIDITY. 2 Validity A test is said to be VALID if it measures what it is supposed to measure

47

Appraisal …

• On the basis of the evidence we have, the test appears to be reliable.

• But is it also VALID? • Do the PROFILES match up with the employees’

ACTUAL PERFORMANCE in the team roles to which they have been assigned?

• Managers think they do; but the validity of the Colours Test has yet to be confirmed statistically.

Page 48: 1 Lecture 4 CONSTRUCT VALIDITY. 2 Validity A test is said to be VALID if it measures what it is supposed to measure

48

Summary

• A test is VALID if it measures what it is supposed to measure.

• This simple definition, however, is open to a variety of interpretations.

• Today, I have considered CONSTRUCT VALIDITY, the kind of validity that is the most problematic of all.

• To demonstrate the construct validity of a test, the researcher must show, not only that the test correlates with the ‘right’ variables, but also that it dissociates from the ‘wrong’ ones.

• These two essential properties are known as CONVERGENCE and DIVERGENCE.

Page 49: 1 Lecture 4 CONSTRUCT VALIDITY. 2 Validity A test is said to be VALID if it measures what it is supposed to measure

49

Summary …

• Witkin’s tests of Field-Dependence-Independence show convergence with other ‘analytical’ cognitive tests and dissociation from ‘non-analytical’ tests.

• The Visual Patterns and Corsi tests of nonverbal working memory correlate to some extent (convergence) but the Corsi and Pattern spans are affected in opposite directions by visual and spatial interference (divergence).

Page 50: 1 Lecture 4 CONSTRUCT VALIDITY. 2 Validity A test is said to be VALID if it measures what it is supposed to measure

50

Practice question

What is meant by the validity of a psychological test? What is the relationship between the two properties? Describe one approach to the

determination of validity.