28
Utilizing Rasch Analysis to compare the psychometric properties of four mindfulness measures and conduct scale revision Sharon Solloway, Bloomburg University Theo Dawson, Developmental Testing Service, INC. Online version

Utilizing Rasch Analysis to compare the psychometric ... · Utilizing Rasch Analysis to compare the psychometric properties of four mindfulness measures and conduct scale revision

  • Upload
    vobao

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

Utilizing Rasch Analysis to compare the psychometric properties of four mindfulness measures and conduct scale revision

Sharon Solloway, Bloomburg University Theo Dawson, Developmental Testing Service, INC.

Online version

Disclosure

• I have no actual or potential conflict of interest in relation to this program/presentation.

Optimizing instruments

• Optimizing a survey instrument involves finding a good balance between survey length, statistical reliability, and various forms of validity.

• Length: Survey instruments that are too long are impractical, so care should be taken to ensure that a survey is no longer than it needs to be to meet reliability and validity standards.

• Reliability: If the assessment is intended for use in individual assessment, the corrected reliability (for the targeted population) should distinguish an adequate number of levels of performance.

• Validity: A survey should measure what it is designed to measure.

Validity (part 1)

• Content validity: How well does an assessment represent the elements of a construct?

• The SMS was designed to measure intentional attention and its effects. Its items are based on evidence taken from the journals of students who were learning a mindfulness practice.

• Construct validity, convergent: Higher scores on the survey should be positively related to greater experience with mindfulness practice.

• Construct validity, discriminant: Higher scores on the survey should not be related to sex or ethnic affiliation (in the target population).

Validity (part 2)

• Construct validity, psychometric: The survey can be shown to capture a construct that is adequately unidimensional, with items that "mean the same thing" to "everyone" who takes the assessment.

• Criterion validity, predictive: The measure captures change over time, preferably change that is linked to other outcomes, such as behavioral change.

• Criterion validity, convergent: The measure is correlated with other, similar measures.

• SMS correlation with other scales: MAAS, r = .43 (n = 192), TMS curious, r = .70 (n = 258), TMS open, r = .61 (n = 261), FFMQ noticing, r = .55 (n = 42), FFMQ non-reactivity, r = .44 (n = 42).

Measuring performances vs. trends

• Most mindfulness scales have been developed to conventional research standards for precision (alpha = .75+).

• This level of precision is inadequate for the assessment of individuals, where (corrected*) alphas in the region of .85 or higher are required to make meaningful measurements.

• If we take measurement error into account, a corrected alpha of .70 allows us to detect 2 distinct levels of an attribute. A corrected alpha of .80 allows the detection of 3 levels, and detecting 4 levels requires a corrected alpha of .90.

• Because the SMS is designed for use in the classroom, where making genuine distinctions between individual

Comparison of four scales (part 1)

• Between 2006 and 2009, these surveys were taken by hundreds of undergraduate students who received instruction in mindfulness practice.

• The SMS was the only instrument that met the reliability requirements for individual assessment.

Comparison of four scales (part 2)

N Items Alpha Rasch reliability G* Strata

(4G+1)/3

FFMQ non-reactivity 320 7 0.68 0.6 1.21 1.61

FFMQ noticing 320 8 0.74 0.7 1.68 2.24

FFMQ inattention 320 8 0.84 0.79 1.96 2.61

FFMQ finding 320 8 0.71 0.64 1.35 1.8

FFMQ emotion 320 8 0.84 0.76 1.77 2.36

MAAS 538 15 0.85 0.81 2.06 2.75

TMS curious 474 7 0.84 0.77 1.85 2.47

TMS open 474 8 0.75 0.72 1.6 2.13

SMS original 873 33 0.95 0.93 3.62 4.83

SMS 2008-33 and SMS 2008-21

• Item fit for the 33 item version of the SMS indicated that some of the items were not working well with other items to represent the construct.

• Ten out of 33 of the original items underfit the model generated with the Rasch analysis software, with infit Z's greater than 2.0.

• To address this problem, we conducted a series of Rasch analyses in which we eliminated underfitting items, reaching a reasonable 21 item solution—covering all of the major components of the construct—with only 5 items that underfit the model.

SMS 2008-33 and SMS 2008-21, cont.

N Items Alpha Rasch reliability G*

Strata (4G+1)/3

SMS 2008-33 873 33 0.95 0.93 3.62 4.83

SMS 2008-21 873 21 0.93 0.91 3.14 4.17

SMS 2008-33 and SMS 2008-21, cont.

Dimension (n = 873) 2008-33 2008-21

Range of person estimates 87 99

Range of item means 25 12

Average standard error of person estimates 3.82 5.3

Rasch reliability 0.93 0.91

Strata (levels along the dimension) 4.7 4.2

Item fit (number of items with infit and outfit z-standards above 2.0) 4,10 5,5

Variance explained by by measures 37.8% 38.3%

Correlation with age .129* .139*

Correlation with education .123* .134*

Correlation with meditation experience .244* .248*

Correlation with test time (respondents who took the assessment more than once, n = 194) .347* .337*

Correlation with sex (Spearman) 0.005 -0.005

SMS 2008-21, pre and post

• To further evaluate item functioning, we examined item performance in pre-instruction and post-instruction conditions.

• SMS items behaved better in the post-instruction condition than in the pre-instruction condition.

N Items Alpha Rasch reliability G*

Strata (4G+1)/3

SMS 2008-21 (pre instruction)

194 21 0.87 0.87 2.62 3.49

SMS 2008-21 (post instruction)

194 21 0.91 0.9 3.04 4.06

Rewriting item stems

• We hypothesized that the wording of some of the items could be causing the scale to perform differently in pre- and post-instruction conditions.

• Some of the items included phrases like, "I am learning," or "I feel more," which would not mean the same thing to people in pre- and post-instruction conditions.

• Twelve items were reworded to eliminate this bias.

• For example, "I am learning how to reduce my stress," was reworded as "I am able to consciously reduce feelings of stress."

• Three items that were not in the 2008-21 version, but were in the 2008-33 version, were reworded and added back into the item pool, resulting in the 2010-24 verison of

2008-21 vs. 2010-24 (part 1)

N Items Alpha Rasch reliability G*

Strata (4G+1)/3

SMS 2008-21 (2010 administration)

189 21 0.93 0.92 3.46 4.61

SMS 2010-24 (2010 administration)

189 24 0.94 0.94 3.87 5.16

2008-21 vs. 2010-24 (part 2)

Dimension 2008-21 2010-24

Range of person estimates 63 72

Range of item means 12 18

Average standard error of person estimates 2.72 2.6

Rasch reliability 0.92 0.94

Strata (levels along the dimension) 4.6 5.2

Item fit (number of items with infit and outfit z-standards above 2.0)

2,3 2,1

Variance explained by by measures 43.5% 46.3%

Correlation with age .172* .177*

Correlation with education .197* .181*

Correlation with meditation experience .395* .436*

Correlation with sex (Spearman) .181* 0.126

2008-33 vs. 2010-24

Dimension 2008-33 2010-24

N 873 189

Range of person estimates 87 72

Range of item means 25 18

Average standard error of person estimates 3.82 2.6

Rasch reliability 0.93 0.94

Strata (levels along the dimension) 4.7 5.2

Item fit (number of items with infit and outfit z-standards above 2.0) 4,10 2,1

Variance explained by by measures 37.8% 46.3%

Correlation with age .129* .177*

Correlation with education .123* .181*

Correlation with meditation experience .244* .436*

Correlation with test time .382* n/a

Correlation with sex (Spearman) 0.005 0.126

The meaning of mindfulness levels

• With a Rasch analysis, if the items fit the model, the item hierarchy generated by the data tells a story of growth in ability along the construct being measured.

• As you move up the hierarchy from bottom to top, the items tell a story about growth in mindfulness.

• increased awareness,

• increasing openness,

• an increasing ability to consciously enter a state of well-being, and

• an increasing ability to evoke compassion.

Narratives

• Levels on the SMS correspond to student experiences as they are described in journal entries.

Next steps

• Investigate the stability of the metric in a variety of conditions—different age groups, different levels of mindfulness experience, different contexts.

• Model the "predictive" validity of the metric by comparing growth on the metric with evidence of growth from student journals.