60
Development and Validation of a health status measure Susan Stock, MD MSc FRCPC Susan Stock, MD MSc FRCPC Institut national de santé publique du Québec Institut national de santé publique du Québec Direction de santé publique de Montreal-Centre Direction de santé publique de Montreal-Centre Dept of Epidemiology, Biostatistics and Dept of Epidemiology, Biostatistics and Occupational Health, McGill Occupational Health, McGill 513-611A STUDY DESIGN AND ANALYSIS I Sept 29, 2003

Development and Validation of a health status measure Susan Stock, MD MSc FRCPC Institut national de santé publique du Québec Institut national de santé

Embed Size (px)

Citation preview

Page 1: Development and Validation of a health status measure Susan Stock, MD MSc FRCPC Institut national de santé publique du Québec Institut national de santé

Development and Validation of a health status measure

Susan Stock, MD MSc FRCPCSusan Stock, MD MSc FRCPCInstitut national de santé publique du QuébecInstitut national de santé publique du QuébecDirection de santé publique de Montreal-CentreDirection de santé publique de Montreal-CentreDept of Epidemiology, Biostatistics and Occupational Health, McGillDept of Epidemiology, Biostatistics and Occupational Health, McGill

513-611A STUDY DESIGN AND ANALYSIS I

Sept 29, 2003

Page 2: Development and Validation of a health status measure Susan Stock, MD MSc FRCPC Institut national de santé publique du Québec Institut national de santé

Susan Stock : Developing & Validating Health Status Measures

Plan

Types of Health Status MeasuresTypes of Health Status Measures Steps in the development of a health status Steps in the development of a health status

measuremeasure Steps in the development of the Neck and Upper Steps in the development of the Neck and Upper

Limb IndexLimb Index Steps in the validation of a health status measureSteps in the validation of a health status measure Steps in the validation of the Neck and Upper Steps in the validation of the Neck and Upper

Limb IndexLimb Index

Page 3: Development and Validation of a health status measure Susan Stock, MD MSc FRCPC Institut national de santé publique du Québec Institut national de santé

Susan Stock : Developing & Validating Health Status Measures

Health status measure

A health outcome questionnaire that quantifies symptoms, A health outcome questionnaire that quantifies symptoms, function, feelings and/or behaviour directly from the function, feelings and/or behaviour directly from the respondentrespondent to measure overall health status (generic to measure overall health status (generic instrument) or disorder-specific health statusinstrument) or disorder-specific health status

Vary in scopeVary in scope Activities of daily living ("ADL”- e.g. self care, mobility)Activities of daily living ("ADL”- e.g. self care, mobility) Functional status – measure capacity or performance of Functional status – measure capacity or performance of

physical functioning, e.g. household tasks, work, physical functioning, e.g. household tasks, work, recreational activitiesrecreational activities

Health-related "quality of life" instruments - measure not Health-related "quality of life" instruments - measure not only physical functioning but also psychological, social only physical functioning but also psychological, social and role functioningand role functioning

Page 4: Development and Validation of a health status measure Susan Stock, MD MSc FRCPC Institut national de santé publique du Québec Institut national de santé

Susan Stock : Developing & Validating Health Status Measures

Health status measures

allow patient/subject to identify impact of a disorder or allow patient/subject to identify impact of a disorder or health problem on his/her life across many dimensions health problem on his/her life across many dimensions based on his/her experience rather than the interpretation based on his/her experience rather than the interpretation of a health care professionalof a health care professional

Useful in a wide range of studies and clinical contexts:Useful in a wide range of studies and clinical contexts: In studies of aetiology, prevalence and prognostic In studies of aetiology, prevalence and prognostic

factors they can be incorporated into case definitions factors they can be incorporated into case definitions that distinguish according to severitythat distinguish according to severity

In intervention studies and health services research In intervention studies and health services research they can be used as the primary outcome to they can be used as the primary outcome to demonstrate change over time in health statusdemonstrate change over time in health status

Page 5: Development and Validation of a health status measure Susan Stock, MD MSc FRCPC Institut national de santé publique du Québec Institut national de santé

Susan Stock : Developing & Validating Health Status Measures

Development of Health Status Measures: references Streiner DL, Norman GR. Health measurement scales. A practical Streiner DL, Norman GR. Health measurement scales. A practical

guide to their development and use. Second edition. New York: guide to their development and use. Second edition. New York: Oxford University Press, 1995: 28-53Oxford University Press, 1995: 28-53

Guyatt GH, Bombardier C, Tugwell PX. Measuring disease-Guyatt GH, Bombardier C, Tugwell PX. Measuring disease-specific quality of life in clinical trials. specific quality of life in clinical trials. CMAJ CMAJ 1986; 134: 889-1986; 134: 889-895 895 

Guyatt GH, Jaeschke R, Feeny DH, Patrick DL. Measurement in Guyatt GH, Jaeschke R, Feeny DH, Patrick DL. Measurement in clinical trials: Choosing the right approach. clinical trials: Choosing the right approach.

Juniper EF, Guyatt GH, Jaeschke R. How to develop and validate Juniper EF, Guyatt GH, Jaeschke R. How to develop and validate a new health-related quality of life instrument. a new health-related quality of life instrument. In Spilker B (ed), Quality of Life and Pharmacoeconomics in In Spilker B (ed), Quality of Life and Pharmacoeconomics in

Clinical Trials, Second edition. Lippencott-Raven Publishers, Clinical Trials, Second edition. Lippencott-Raven Publishers, Philadelphia, 1996Philadelphia, 1996

Page 6: Development and Validation of a health status measure Susan Stock, MD MSc FRCPC Institut national de santé publique du Québec Institut national de santé

Susan Stock : Developing & Validating Health Status Measures

Neck and Upper Limb Index (NULI)

Health-related quality of life instrument:Health-related quality of life instrument: specific to neck and upper extremity musculoskeletal specific to neck and upper extremity musculoskeletal

disordersdisorders capable of measuring changes within subjects over time in capable of measuring changes within subjects over time in

intervention studiesintervention studies capable of distinguishing between subjects (i.e., assess capable of distinguishing between subjects (i.e., assess

severity) in prognostic, prevalence or etiologic studiesseverity) in prognostic, prevalence or etiologic studies applicable to both French and English speaking applicable to both French and English speaking

populations in Canada, andpopulations in Canada, and practical and easy to use in clinical settingspractical and easy to use in clinical settings

Page 7: Development and Validation of a health status measure Susan Stock, MD MSc FRCPC Institut national de santé publique du Québec Institut national de santé

Susan Stock : Developing & Validating Health Status Measures

In order to develop an instrument that was equally In order to develop an instrument that was equally appropriate to the two major cultural and linguistic appropriate to the two major cultural and linguistic groups in Canadagroups in Canada

Conducted two separate studies with similar Conducted two separate studies with similar protocols for item reduction and selection and protocols for item reduction and selection and subsequent validationsubsequent validation one in an Ontario English-speaking population one in an Ontario English-speaking population the other in a Quebec French-speaking the other in a Quebec French-speaking

populationpopulation

Neck and Upper Limb Index (NULI)

Page 8: Development and Validation of a health status measure Susan Stock, MD MSc FRCPC Institut national de santé publique du Québec Institut national de santé

Susan Stock : Developing & Validating Health Status Measures

Steps in development of a health status measure Search for appropriate existing measure!Search for appropriate existing measure!

If none available:If none available: Identify domains of interestIdentify domains of interest Generating potential itemsGenerating potential items Refine items and pre-testRefine items and pre-test Choose appropriate response scale(s) for the itemsChoose appropriate response scale(s) for the items Carry out item reduction and item selection Carry out item reduction and item selection

strategiesstrategies

Page 9: Development and Validation of a health status measure Susan Stock, MD MSc FRCPC Institut national de santé publique du Québec Institut national de santé

Susan Stock : Developing & Validating Health Status Measures

Steps in development of the NULI Identification of domains of interestIdentification of domains of interest Generation of potential itemsGeneration of potential items Item refinement and pre-testingItem refinement and pre-testing English item reduction and selection studyEnglish item reduction and selection study French translation of potential items French translation of potential items French item reduction and selection studyFrench item reduction and selection study Comparison of English and French resultsComparison of English and French results Selection of 20 items appropriate for both Selection of 20 items appropriate for both

populationspopulations Reliability and validity testing of the final 20-item Reliability and validity testing of the final 20-item

instrument in both English and French populationsinstrument in both English and French populations

Page 10: Development and Validation of a health status measure Susan Stock, MD MSc FRCPC Institut national de santé publique du Québec Institut national de santé

Susan Stock : Developing & Validating Health Status Measures

Domain

a dimension of life potentially affected by the a dimension of life potentially affected by the disorder or health problem in questiondisorder or health problem in question e.g. self care, household responsibilities, work, social e.g. self care, household responsibilities, work, social

life, sexual life, mood, self esteem, transportation, life, sexual life, mood, self esteem, transportation, recreation, sleep, financial impact of disorder, recreation, sleep, financial impact of disorder, iatrogenic effect of evaluation or treatmentiatrogenic effect of evaluation or treatment

Page 11: Development and Validation of a health status measure Susan Stock, MD MSc FRCPC Institut national de santé publique du Québec Institut national de santé

Susan Stock : Developing & Validating Health Status Measures

Identifying domains & generating items

Strategies for identifying the most appropriate Strategies for identifying the most appropriate domains of interest and for generating potential domains of interest and for generating potential

items are aimed at optimizing items are aimed at optimizing content validitycontent validity the extent to which the measurement incorporates the extent to which the measurement incorporates all all

the relevant content orthe relevant content or domains of the phenomenon domains of the phenomenon under studyunder study

Page 12: Development and Validation of a health status measure Susan Stock, MD MSc FRCPC Institut national de santé publique du Québec Institut national de santé

Susan Stock : Developing & Validating Health Status Measures

NULI: Identifying domains & generating items review of relevant literature (rheumatology, rehabilitation, review of relevant literature (rheumatology, rehabilitation,

orthopaedics, back pain)orthopaedics, back pain) review of existing health status instruments identified by review of existing health status instruments identified by

bibliographic search and contact with experts bibliographic search and contact with experts clinical experience of investigators clinical experience of investigators survey of 30 clinicianssurvey of 30 clinicians interviews with 33 worker-patients who presented with interviews with 33 worker-patients who presented with

neck and/or upper limb disorders in five clinical neck and/or upper limb disorders in five clinical occupational healthoccupational health settings settings

Page 13: Development and Validation of a health status measure Susan Stock, MD MSc FRCPC Institut national de santé publique du Québec Institut national de santé

Susan Stock : Developing & Validating Health Status Measures

Evaluating content validity of existing instruments

Identify relevant domains for the concept of interest and Identify relevant domains for the concept of interest and evaluate whether instruments measure these domains evaluate whether instruments measure these domains adequatelyadequately

Identify number or proportion of items in each instrument Identify number or proportion of items in each instrument that are not relevant to the concept you wish to measurethat are not relevant to the concept you wish to measure

Ref: Ref: Stock SR, Cole DC, Tugwell P. Review of applicability of Stock SR, Cole DC, Tugwell P. Review of applicability of existing functional status measures to the study of workers with existing functional status measures to the study of workers with musculoskeletal disorders of the neck and upper limb. musculoskeletal disorders of the neck and upper limb. Am J Am J Indust MedIndust Med 1996, 29, 679-688 1996, 29, 679-688

Page 14: Development and Validation of a health status measure Susan Stock, MD MSc FRCPC Institut national de santé publique du Québec Institut national de santé

Susan Stock : Developing & Validating Health Status Measures

     WoWorkrk

Self Self carecare  

  HouseHousehold & hold & familyfamily

  SoSocial cial lifelife

RecreaRecreationtion

SleSleepep

MooMoodd

    Sex Sex lifelife

PDIPDI 22 44 22 -- 22 -- -- --

DRIDRI 22 11 11 -- 22 -- -- --

NULINULI 44 11 44 -- 22 22 44 --

DASHDASH 11 11 99 11 3 (+4)3 (+4) 11 -- 11

ASESASES 11 44 -- -- 11 11 -- --

An example of evaluation of content validity

Distribution of items among domains for selected musculoskeletal functional status instruments

Page 15: Development and Validation of a health status measure Susan Stock, MD MSc FRCPC Institut national de santé publique du Québec Institut national de santé

Susan Stock : Developing & Validating Health Status Measures

80 questions in 8 domains identified through 80 questions in 8 domains identified through investigator clinical experience, existing investigator clinical experience, existing instruments and literatureinstruments and literature

52 additional items and 2 domains generated by 52 additional items and 2 domains generated by clinician surveyclinician survey

48 additional items and 2 more domains identified 48 additional items and 2 more domains identified by patient interviewsby patient interviews

Total of 12 domains identifiedTotal of 12 domains identified

NULI: Identifying domains & generating items

Page 16: Development and Validation of a health status measure Susan Stock, MD MSc FRCPC Institut national de santé publique du Québec Institut national de santé

Susan Stock : Developing & Validating Health Status Measures

Item refinement

Redundant items eliminatedRedundant items eliminated Pool of approximately 150 items with 7-30 items per domainPool of approximately 150 items with 7-30 items per domain Wording of itemsWording of items Literacy editor to ensure Grade 6 languageLiteracy editor to ensure Grade 6 language ““Applicability”: Screening question developed for Applicability”: Screening question developed for activity-activity-

related items to evaluate whether the item was applicable or related items to evaluate whether the item was applicable or relevant to the subject (work, household and family relevant to the subject (work, household and family responsibilities, transportation/driving, recreation, and social responsibilities, transportation/driving, recreation, and social activities; sexual life) activities; sexual life) Vacuuming, shovelling snowVacuuming, shovelling snow Sports activitiesSports activities

Page 17: Development and Validation of a health status measure Susan Stock, MD MSc FRCPC Institut national de santé publique du Québec Institut national de santé

Susan Stock : Developing & Validating Health Status Measures

Item refinement: Choice of response scale Response scale: Response scale: 7-point numbered scale with 7-point numbered scale with

verbal anchors verbal anchors Maximize reliability: reliability of a scale rises Maximize reliability: reliability of a scale rises

rapidly as the number of divisions increases to rapidly as the number of divisions increases to seven and then rises more slowly until there are 11 seven and then rises more slowly until there are 11 points points (Streiner and Norman 1995, (Streiner and Norman 1995, Nunnally et Wilson 1975, Nunnally et Wilson 1975, Nishisato et Torii 1985Nishisato et Torii 1985 ) )

Page 18: Development and Validation of a health status measure Susan Stock, MD MSc FRCPC Institut national de santé publique du Québec Institut national de santé

Susan Stock : Developing & Validating Health Status Measures

Response scale : number of points on a scale Loss of test re-test reliability:Loss of test re-test reliability:

7-10 categories: little reduction of reliability7-10 categories: little reduction of reliability 5 categories reduces reliability by 12%5 categories reduces reliability by 12% 2 categories reduces reliability by 35%2 categories reduces reliability by 35%

OptimOptimum number of points recommended: um number of points recommended: ((55 to) to) 7 7 categoriescategories

(Reference: Streiner and Norman 1995, Chap 4)(Reference: Streiner and Norman 1995, Chap 4)

Treating rating scales as interval data statistically Treating rating scales as interval data statistically will result in less measurement error when there will result in less measurement error when there are more itemsare more items

Page 19: Development and Validation of a health status measure Susan Stock, MD MSc FRCPC Institut national de santé publique du Québec Institut national de santé

Susan Stock : Developing & Validating Health Status Measures

Scaling: number of points on a scale

Potential sources of error when there are few Potential sources of error when there are few points on a scale:points on a scale:

Uncertainty, confusion of respondentsUncertainty, confusion of respondents Reduction in reliabilityReduction in reliability Loss of Loss of efficiencyefficiency of the instrument of the instrument

More subjects needed to show an effect (S Suissa More subjects needed to show an effect (S Suissa J Clin Epidemiol 1991, 44: 241-8)J Clin Epidemiol 1991, 44: 241-8)

Lower correlation with other measures (Hunter & Lower correlation with other measures (Hunter & Schmidt 1990, J Applied Psychol 75:334-49)Schmidt 1990, J Applied Psychol 75:334-49)

Page 20: Development and Validation of a health status measure Susan Stock, MD MSc FRCPC Institut national de santé publique du Québec Institut national de santé

Susan Stock : Developing & Validating Health Status Measures

Pre-test

Pre-test in 10 clients with musculoskeletal Pre-test in 10 clients with musculoskeletal disorders of neck or upper extremity in a disorders of neck or upper extremity in a vocational rehabilitation clinicvocational rehabilitation clinic

To identify questions that are unclear, ambiguous, To identify questions that are unclear, ambiguous, difficult to understand or inappropriatedifficult to understand or inappropriate

Revise items following pre-testRevise items following pre-test

Page 21: Development and Validation of a health status measure Susan Stock, MD MSc FRCPC Institut national de santé publique du Québec Institut national de santé

Susan Stock : Developing & Validating Health Status Measures

Inter-rater reliability testing inter-rater reliability study of revised potential items inter-rater reliability study of revised potential items English study conducted on 38 worker-patients with neck and upper limb English study conducted on 38 worker-patients with neck and upper limb

disorders in four clinical settings prior to the item selection study; French disorders in four clinical settings prior to the item selection study; French inter-rater reliability study was conducted with 16 worker-patients inter-rater reliability study was conducted with 16 worker-patients

2 raters interviewed each patient on the same day, at 2-4 hour intervals2 raters interviewed each patient on the same day, at 2-4 hour intervals Following the second interview, feedback was sought from respondents to Following the second interview, feedback was sought from respondents to

further identify any ambiguous items or those difficult to understandfurther identify any ambiguous items or those difficult to understand ICC (intraclass correlations) calculated for the mean of items in each ICC (intraclass correlations) calculated for the mean of items in each

domain and for each individual item. domain and for each individual item. Items with low inter-rater reliability (ICC<0.7) identified and source of Items with low inter-rater reliability (ICC<0.7) identified and source of

difficulty reviewed with the interviewers.difficulty reviewed with the interviewers. Items were reformulated where indicated.Items were reformulated where indicated.

Page 22: Development and Validation of a health status measure Susan Stock, MD MSc FRCPC Institut national de santé publique du Québec Institut national de santé

Susan Stock : Developing & Validating Health Status Measures

Interviewer training

3-5-day training sessions for interviewers 3-5-day training sessions for interviewers to be familiar with content of questions, use of to be familiar with content of questions, use of

scalesscales to teach appropriate standardised technique to teach appropriate standardised technique interviewers trained to probe in a non-directive, interviewers trained to probe in a non-directive,

non-biasing fashion, and be interpersonally neutralnon-biasing fashion, and be interpersonally neutral feedback on tape-recorded interviewsfeedback on tape-recorded interviews role-playing of interviews with potentially role-playing of interviews with potentially

difficult subjectsdifficult subjects

Page 23: Development and Validation of a health status measure Susan Stock, MD MSc FRCPC Institut national de santé publique du Québec Institut national de santé

Susan Stock : Developing & Validating Health Status Measures

Interviewer training To reduce bias and random error and ensure strict adherence to research To reduce bias and random error and ensure strict adherence to research

protocolprotocol Inform re purpose of study, type of data to be gathered, how results will be Inform re purpose of study, type of data to be gathered, how results will be

usedused Familiarize with questionnaire, understand every itemFamiliarize with questionnaire, understand every item How to handle first meeting with respondent, techniques for building rapportHow to handle first meeting with respondent, techniques for building rapport How to answer questions commonly asked by respondentsHow to answer questions commonly asked by respondents Confidentiality proceduresConfidentiality procedures When and how to probeWhen and how to probe How to ask questionsHow to ask questions How to record responsesHow to record responses Checking the questionnaireChecking the questionnaire How to end interviewsHow to end interviews How to deal with special situations (angry, tearful, or verbose respondents)How to deal with special situations (angry, tearful, or verbose respondents)

Page 24: Development and Validation of a health status measure Susan Stock, MD MSc FRCPC Institut national de santé publique du Québec Institut national de santé

Susan Stock : Developing & Validating Health Status Measures

Item reduction studies

Study procedure: Study procedure: Pre and post-treatment administration of 170 potential items Pre and post-treatment administration of 170 potential items

and validating measures to 119 English-speaking Ontario and validating measures to 119 English-speaking Ontario workers and to 93 French-speaking Quebec workers with neck workers and to 93 French-speaking Quebec workers with neck or upper limb disorders recruited from occupational and or upper limb disorders recruited from occupational and physiotherapy clinics physiotherapy clinics 7-30 specific items in each of the 12 domains including a global 7-30 specific items in each of the 12 domains including a global

question about the overall impact of the disorder on that domainquestion about the overall impact of the disorder on that domain An additional administration 3-7 days after the initial An additional administration 3-7 days after the initial

administration for test re-rest reliabilityadministration for test re-rest reliability Subjects rank ordered the 12 domains according to the relative Subjects rank ordered the 12 domains according to the relative

importance of the impact of their musculoskeletal disorder on importance of the impact of their musculoskeletal disorder on these dimensions of their livesthese dimensions of their lives

Page 25: Development and Validation of a health status measure Susan Stock, MD MSc FRCPC Institut national de santé publique du Québec Institut national de santé

Susan Stock : Developing & Validating Health Status Measures

NULI Item reduction

Objective of item reduction:Objective of item reduction: To identify and omit items that were irrelevant, To identify and omit items that were irrelevant,

unresponsive, had poor test re-test reliability, unresponsive, had poor test re-test reliability, discriminated poorly or were unresponsive to discriminated poorly or were unresponsive to changechange

Page 26: Development and Validation of a health status measure Susan Stock, MD MSc FRCPC Institut national de santé publique du Québec Institut national de santé

Susan Stock : Developing & Validating Health Status Measures

Criteria for item reduction

Applicability of activity related itemsApplicability of activity related items Eliminate items not applicable to at least 80% study Eliminate items not applicable to at least 80% study

populationpopulation Eliminate items not applicable to at least 70% of men Eliminate items not applicable to at least 70% of men

and 70% of womenand 70% of women e.g. vacuuming applicable to 49% men 83% womene.g. vacuuming applicable to 49% men 83% women Shovelling snow not applicable to 82% womenShovelling snow not applicable to 82% women

Reproducibility Reproducibility Eliminate items with Pearson correlation coefficient Eliminate items with Pearson correlation coefficient

0.50.5

Page 27: Development and Validation of a health status measure Susan Stock, MD MSc FRCPC Institut national de santé publique du Québec Institut national de santé

Susan Stock : Developing & Validating Health Status Measures

Criteria for item reduction

Internal consistencyInternal consistency Eliminate items with correlation Eliminate items with correlation 0.3 between item score and: (1) 0.3 between item score and: (1)

mean of all items in the domain without that item; (2) the global mean of all items in the domain without that item; (2) the global question score for the domainquestion score for the domain

Responsiveness to change Responsiveness to change Eliminate items with correlation Eliminate items with correlation 0.3 between 0.3 between the residual the residual

change scores pre-treatment and post-treatment to the residual change scores pre-treatment and post-treatment to the residual change score of the domain Global Score change score of the domain Global Score

Discriminative AbilityDiscriminative Ability Eliminate items withEliminate items with a skewness statistic > 2 standard deviations a skewness statistic > 2 standard deviations

of the standard error of this statisticof the standard error of this statistic

Page 28: Development and Validation of a health status measure Susan Stock, MD MSc FRCPC Institut national de santé publique du Québec Institut national de santé

Susan Stock : Developing & Validating Health Status Measures

Measuring change

Problem with change scores: regression to the Problem with change scores: regression to the mean (tendency of outlying scores to return to the mean (tendency of outlying scores to return to the mean)mean) by chance low pre-test scores will be higher on post-by chance low pre-test scores will be higher on post-

test and high pre-test scores will be lower on post-test) test and high pre-test scores will be lower on post-test) Possible solution: residual change scoresPossible solution: residual change scores

Page 29: Development and Validation of a health status measure Susan Stock, MD MSc FRCPC Institut national de santé publique du Québec Institut national de santé

Susan Stock : Developing & Validating Health Status Measures

Selection of final domains

Selection of domains: relative impact and importance Selection of domains: relative impact and importance study subjects attributed to each domainstudy subjects attributed to each domain mean score of the global question for each domain and mean score of the global question for each domain and domain rankings domain rankings calculated for each study population as well as by gendercalculated for each study population as well as by gender committee of co-investigators reviewed these data and, through committee of co-investigators reviewed these data and, through

consensus discussion, arrived at a choice of priority domains and consensus discussion, arrived at a choice of priority domains and the number of items of each domain the final instrument should the number of items of each domain the final instrument should includeinclude

Selection among remaining itemsSelection among remaining items

Page 30: Development and Validation of a health status measure Susan Stock, MD MSc FRCPC Institut national de santé publique du Québec Institut national de santé

Susan Stock : Developing & Validating Health Status Measures

Comparison of Global Question Mean Scores for Each Domain between Quebec

and Ontario Study Populations

Work Sleep Recr Mood Housework Esteem Self care Financial Driving Sex life Iatrog Social1

2

3

4

5

6

7

Mea

n gl

obal

que

stio

n sc

ore

Ontario

Québec

Ontario 5.4 4.9 4 3.9 3.7 3.5 3.3 3 2.9 2.9 2.5 2.1

Québec 5.3 5.2 4.3 3.8 4 4 3.4 2 3 4.2 3.2 2.6

Page 31: Development and Validation of a health status measure Susan Stock, MD MSc FRCPC Institut national de santé publique du Québec Institut national de santé

Susan Stock : Developing & Validating Health Status Measures

Comparison of Mean Ranking for Each Domain between Quebec and Ontario

Study Populations

WORK HOUSE/F SLEEP MOOD RECR $$$ S/CARE DRIV ESTEEM IATRO SOCIAL SEX1

2

3

4

5

6

7

8

9

10

11

12

13

- m

ea

n r

an

k

Ontario

Quebec

Ontario 10.8 8.4 8.1 6.9 6.5 6.5 6.5 5.9 5.7 4.8 4.7 3.7

Quebec 10.3 8.1 8.6 6.3 7.2 4.2 6.7 6.3 5.1 5.9 4.8 4.5

Page 32: Development and Validation of a health status measure Susan Stock, MD MSc FRCPC Institut national de santé publique du Québec Institut national de santé

Susan Stock : Developing & Validating Health Status Measures

Selection of remaining items

Selection of the most responsive and most discriminating Selection of the most responsive and most discriminating items that covered the priority domains items that covered the priority domains

Number of items that would result in an instrument that Number of items that would result in an instrument that takes no more than 5-10 minutes to complete (version 1= takes no more than 5-10 minutes to complete (version 1= 35-items; version 2 = 20 items)35-items; version 2 = 20 items)

Selection among items with similar responsiveness and Selection among items with similar responsiveness and discriminative ability were selected based on the clinical discriminative ability were selected based on the clinical judgement of the co-investigator research committee judgement of the co-investigator research committee

Page 33: Development and Validation of a health status measure Susan Stock, MD MSc FRCPC Institut national de santé publique du Québec Institut national de santé

Susan Stock : Developing & Validating Health Status Measures

Translation into French

double reverse parallel translation methoddouble reverse parallel translation method (Vallerand 1989) (Vallerand 1989) translation into French of the English questionnaire by two independent translation into French of the English questionnaire by two independent

translators (versions A and B)translators (versions A and B) the two French versions (versions A and B) translated into English by the two French versions (versions A and B) translated into English by

two different translators (versions C and D)two different translators (versions C and D) versions C and D compared to the original English version by a versions C and D compared to the original English version by a

committee comprised of three bilingual study researchers (two committee comprised of three bilingual study researchers (two francophones, one anglophone) and discrepancies resolved through francophones, one anglophone) and discrepancies resolved through consensus to arrive at a revised French translation, version Econsensus to arrive at a revised French translation, version E

version E pre-tested on 16 francophone workers with neck or upper version E pre-tested on 16 francophone workers with neck or upper extremity disorders to identify ambiguous or difficult to understand extremity disorders to identify ambiguous or difficult to understand itemsitems

results of the pre-test reviewed by the research translation committee results of the pre-test reviewed by the research translation committee and a final French version of the questionnaire was agreed upon and a final French version of the questionnaire was agreed upon (version F). (version F).

Page 34: Development and Validation of a health status measure Susan Stock, MD MSc FRCPC Institut national de santé publique du Québec Institut national de santé

Susan Stock : Developing & Validating Health Status Measures

Criteria for acceptance of a French formulation

meaning of the French version was as close as possible to the English one

the most simple term would be selected (in order to be understandable at a Grade 6 or lower reading level)

French syntax would be respected the terms most commonly used in current Quebec

French would be selected

Page 35: Development and Validation of a health status measure Susan Stock, MD MSc FRCPC Institut national de santé publique du Québec Institut national de santé

Susan Stock : Developing & Validating Health Status Measures

Comparison of English and French item reduction results Compare demographic profile of the 2 populationsCompare demographic profile of the 2 populations comparcomparee English and French subjects’ English and French subjects’ mean responses for the global mean responses for the global

question of each domainquestion of each domain by t-test for univariate analyses and by t-test for univariate analyses and multiple regression analyses controlling for sex, age, income and multiple regression analyses controlling for sex, age, income and duration of symptomsduration of symptoms

comparcomparee English and French subjects’ English and French subjects’ mean ranking scoresmean ranking scores for each for each domain by Wilcoxon rank-sum test for univariate analyses and by domain by Wilcoxon rank-sum test for univariate analyses and by partial Spearman correlations between the mean ranking score of each partial Spearman correlations between the mean ranking score of each domain and the study group status (i.e., English or French study domain and the study group status (i.e., English or French study group) controlling for sex, age, income and duration of symptomsgroup) controlling for sex, age, income and duration of symptoms

Page 36: Development and Validation of a health status measure Susan Stock, MD MSc FRCPC Institut national de santé publique du Québec Institut national de santé

Susan Stock : Developing & Validating Health Status Measures

Comparison of Ontario and Quebec study populations

Ontario n=119 Quebec n=93

Gender

40.3% female; 59.7% male

55.9% female; 44.1% male

Mean age (S.D.)

39.7 yr. (± 10.1)

41,1 yr. (± 10,0)

% cases with duration of injury > 6 months

30.4

58.8

% cases off work

72.9

57.0

% cases on WCB

67.8

26.9

The Quebec study population was more likely to be female (p=.02), have had symptoms > 6 months (p=.001), still be at work (p=.02) and less likely to be on WCB benefits (p=0.0001)

Page 37: Development and Validation of a health status measure Susan Stock, MD MSc FRCPC Institut national de santé publique du Québec Institut national de santé

Susan Stock : Developing & Validating Health Status Measures

Comparison of Global Question Mean Scores for Each Domain between Quebec

and Ontario Study Populations

Work Sleep Recr Mood Housework Esteem Self care Financial Driving Sex life Iatrog Social1

2

3

4

5

6

7

Mea

n gl

obal

que

stio

n sc

ore

Ontario

Québec

Ontario 5.4 4.9 4 3.9 3.7 3.5 3.3 3 2.9 2.9 2.5 2.1

Québec 5.3 5.2 4.3 3.8 4 4 3.4 2 3 4.2 3.2 2.6

Page 38: Development and Validation of a health status measure Susan Stock, MD MSc FRCPC Institut national de santé publique du Québec Institut national de santé

Susan Stock : Developing & Validating Health Status Measures

Comparison of Mean Ranking for Each Domain between Quebec and Ontario

Study Populations

WORK HOUSE/F SLEEP MOOD RECR $$$ S/CARE DRIV ESTEEM IATRO SOCIAL SEX1

2

3

4

5

6

7

8

9

10

11

12

13

- m

ea

n r

an

k

Ontario

Quebec

Ontario 10.8 8.4 8.1 6.9 6.5 6.5 6.5 5.9 5.7 4.8 4.7 3.7

Quebec 10.3 8.1 8.6 6.3 7.2 4.2 6.7 6.3 5.1 5.9 4.8 4.5

Page 39: Development and Validation of a health status measure Susan Stock, MD MSc FRCPC Institut national de santé publique du Québec Institut national de santé

Susan Stock : Developing & Validating Health Status Measures

Comparison of mean rank of each domain between English and French study subjects: univariate analyses

Domain «Wilcoxon rank sum» test (p)

Personal care .798 Family and domestic responsibilities .448 Work .099 Transportation .412 Mood .168 Self esteem .069 Sleep .251 Sexual life .018 recreation .084 Social life .442 Financial impact .000 Iatrogenic effects .017

Page 40: Development and Validation of a health status measure Susan Stock, MD MSc FRCPC Institut national de santé publique du Québec Institut national de santé

Susan Stock : Developing & Validating Health Status Measures

Correlation of study status (English or French) to mean domain ranking controlling for age, gender, income and duration of symptoms

Domain Partial Spearman correlation coefficient

p

Personal care .003 .966 Family and domestic responsibilities

-.037 .622

Work -.092 .216 Transportation .076 .313 Mood -.116 .122 Self esteem -.135 .075 Sleep .077 .305 Sexual life .146 .051 recreation .170 .022 Social life .061 .415 Financial impact -.293 .0001 Iatrogenic effects .180 .015

Page 41: Development and Validation of a health status measure Susan Stock, MD MSc FRCPC Institut national de santé publique du Québec Institut national de santé

Susan Stock : Developing & Validating Health Status Measures

Multiple regression for each domain to assess whether study status (English or French) was a predictor of the mean score of the domain global question when controlling for age, gender, income and duration of symptoms Domain Standardised

coefficient Signif.

Personal care -.016 .829 Family and domestic responsibilities

-.004 .950

Work -.041 .587 Transportation -.032 .676 Mood -.080 .290 Self esteem -.020 .791 Sleep -.013 .869 Sexual life .3091 .0004 recreation .102 .193 Social life .108 .156 Financial impact -.2302 .002 Iatrogenic effects .1491 .049

1 A positive coefficient indicates that French study subjects had significantly higher mean global scores than English subjects for that domain

2 A negative coefficient indicates that English study subjects had significantly higher mean global scores than French subjects for that domain

Page 42: Development and Validation of a health status measure Susan Stock, MD MSc FRCPC Institut national de santé publique du Québec Institut national de santé

Susan Stock : Developing & Validating Health Status Measures

Synthesis of English-French comparisons Sexual life:Sexual life:

Statistically significant differences in mean ranking and mean domain Statistically significant differences in mean ranking and mean domain global score but clinically insignificant difference in rankingglobal score but clinically insignificant difference in ranking

Domain did not meet applicability criteriaDomain did not meet applicability criteria Financial impact/iatrogenic effects:Financial impact/iatrogenic effects:

Statistically significant differences in mean ranking and mean domain Statistically significant differences in mean ranking and mean domain global score probably reflecting differences in proportion of subjects off global score probably reflecting differences in proportion of subjects off work and differences in clinical treatment programwork and differences in clinical treatment program

Overall no major differences in mean domain rankings or mean Overall no major differences in mean domain rankings or mean domain scores or in results of individual item reduction domain scores or in results of individual item reduction

A single instrument could be developed for both populationsA single instrument could be developed for both populations

Page 43: Development and Validation of a health status measure Susan Stock, MD MSc FRCPC Institut national de santé publique du Québec Institut national de santé

Susan Stock : Developing & Validating Health Status Measures

Final instrument

20 items:20 items: 4 work4 work 7 physical activities (self care, domestic 7 physical activities (self care, domestic

responsibilities, leisure)responsibilities, leisure) 6 psychosocial (mood, self esteem, social role function)6 psychosocial (mood, self esteem, social role function) 2 sleep2 sleep 1 iatrogenic1 iatrogenic

Page 44: Development and Validation of a health status measure Susan Stock, MD MSc FRCPC Institut national de santé publique du Québec Institut national de santé

Susan Stock : Developing & Validating Health Status Measures

Validation of a health status measure Internal consistencyInternal consistency Reproducibility (test re-test reliability)Reproducibility (test re-test reliability) ValidityValidity

ContentContent Criterion or convergentCriterion or convergent ConstructConstruct PredictivePredictive

Responsive to changeResponsive to change

Page 45: Development and Validation of a health status measure Susan Stock, MD MSc FRCPC Institut national de santé publique du Québec Institut national de santé

Susan Stock : Developing & Validating Health Status Measures

Measures of internal consistency

Cronbach alphaCronbach alpha (0.0-1.0) (0.0-1.0) An estimate of the correlation between the total score across a An estimate of the correlation between the total score across a

series of items from a rating scale and the total score that would series of items from a rating scale and the total score that would have been obtained had a comparable series of items been have been obtained had a comparable series of items been employedemployed

Inter-item correlationsInter-item correlations Item-total correlationsItem-total correlations (total (total ± item)± item) Correlation of item to mean of itemsCorrelation of item to mean of items (mean (mean ±± item) item) Split halSplit half reliabilityf reliability (items randomly divided and 2 sub-scales (items randomly divided and 2 sub-scales

correlated)correlated)

Page 46: Development and Validation of a health status measure Susan Stock, MD MSc FRCPC Institut national de santé publique du Québec Institut national de santé

Susan Stock : Developing & Validating Health Status Measures

Reliability

Test re-test reliabilityTest re-test reliability: the stability : the stability exhibited when a exhibited when a measurement is repeated under identical conditionsmeasurement is repeated under identical conditions calculation of the intra-class correlation (ICC) for two calculation of the intra-class correlation (ICC) for two

administrations of the index, 3-7 day apart in 99 administrations of the index, 3-7 day apart in 99 Ontario subjects and 33 Quebec subjectsOntario subjects and 33 Quebec subjects

Internal consistencyInternal consistency: : intercorrelation between items of a intercorrelation between items of a scale meant to measure the same conceptscale meant to measure the same concept Cronbach’s alpha calculated for 119 Ontario subjects Cronbach’s alpha calculated for 119 Ontario subjects

and 93 Quebec subjects present at the initial pre-and 93 Quebec subjects present at the initial pre-treatment administration of the questionnairestreatment administration of the questionnaires

Page 47: Development and Validation of a health status measure Susan Stock, MD MSc FRCPC Institut national de santé publique du Québec Institut national de santé

Susan Stock : Developing & Validating Health Status Measures

Ways of improving reproducibility

Increase the number of items in a test or measurement Increase the number of items in a test or measurement scalescale

Increase the number of response choices for each itemIncrease the number of response choices for each item Reduce inter-observer variation (training of interviewers, Reduce inter-observer variation (training of interviewers,

standardised protocol)standardised protocol) Reduce ambiguity in questionsReduce ambiguity in questions

Page 48: Development and Validation of a health status measure Susan Stock, MD MSc FRCPC Institut national de santé publique du Québec Institut national de santé

Susan Stock : Developing & Validating Health Status Measures

Validity

An expression of the degree to which a measurment An expression of the degree to which a measurment measures what it purports to measure (Last)measures what it purports to measure (Last)

Is the scale measuring what it was intended to measure?Is the scale measuring what it was intended to measure?

Page 49: Development and Validation of a health status measure Susan Stock, MD MSc FRCPC Institut national de santé publique du Québec Institut national de santé

Susan Stock : Developing & Validating Health Status Measures

How do we demonstrate validity?

SSubjective judgement by ubjective judgement by ““expertsexperts””:: Face validityFace validity:: the extent to which, on the face of it, the the extent to which, on the face of it, the

measurement appears to be assessing the desired measurement appears to be assessing the desired qualitiesqualities

Content validityContent validity: the extent to which the measurement : the extent to which the measurement incorporates incorporates all the relevant content orall the relevant content or domains of the domains of the phenomenon under studyphenomenon under study

Page 50: Development and Validation of a health status measure Susan Stock, MD MSc FRCPC Institut national de santé publique du Québec Institut national de santé

Susan Stock : Developing & Validating Health Status Measures

How do we demonstrate validity?

Criterion validityCriterion validity:: extent to which the extent to which the measurement correlates with an external criterion measurement correlates with an external criterion (preferably (preferably a a ""gold standardgold standard")")    

Page 51: Development and Validation of a health status measure Susan Stock, MD MSc FRCPC Institut national de santé publique du Québec Institut national de santé

Susan Stock : Developing & Validating Health Status Measures

How do we demonstrate validity?

Convergent (concurrent) validity:Convergent (concurrent) validity: correlation correlation between measurebetween measurementment of interest and another measure of interest and another measurementment known to measure the same concept known to measure the same concept and measured at the and measured at the same time same time (0.4-0.8)(0.4-0.8)

Predictive validity:Predictive validity: ability of a measurement to predict ability of a measurement to predict the criterionthe criterion

Page 52: Development and Validation of a health status measure Susan Stock, MD MSc FRCPC Institut national de santé publique du Québec Institut national de santé

Susan Stock : Developing & Validating Health Status Measures

Reliability and Responsiveness of Revised 20-item NULI / IDVQ

Ontario (based on

original items)

Quebec (based on

revised format)

Reproducibility test re-test reliability (ICC)

0.88 (n =99)

0.83 (n = 33)

Internal consistency (Chronbach alpha)

0.90 (n = 119)

0.93 (n = 93)

Reponsiveness (Standardised response mean with 95% CI)

1.48 (1.1-1.8) (n = 33)

1.63 (1.3-2.0) (n = 35)

Page 53: Development and Validation of a health status measure Susan Stock, MD MSc FRCPC Institut national de santé publique du Québec Institut national de santé

Susan Stock : Developing & Validating Health Status Measures

Example of convergent validity: Pearson’s Correlations between NULI and other measures

Ontario (based on original items)

Québec (based on revised format)

1-item Global question (mean subject-clinician)

0.60

0.73

Pain Scale

0.42

0.55

Shoulder abduction

-0.32

-0.47

Scratch test

0.37

0.30

Hand grip strength (Jamar)

0.29

-0.41

SIP 0.66

N/A

Physical component SF-36

N/A

-0.50

Mental SF-36

N/A

-0.52

Page 54: Development and Validation of a health status measure Susan Stock, MD MSc FRCPC Institut national de santé publique du Québec Institut national de santé

Susan Stock : Developing & Validating Health Status Measures

How do we demonstrate validity?

Construct validityConstruct validity:: the extent to which the the extent to which the measurement corresponds to theoretical constructs measurement corresponds to theoretical constructs concerning the phenomenon under study concerning the phenomenon under study e.g., testing a hypothesis about whether the measure will e.g., testing a hypothesis about whether the measure will

distinguish between 2 groups who differ with respect to the distinguish between 2 groups who differ with respect to the concept of interestconcept of interest

e.g. NULI: those who returned to work had significantly lower e.g. NULI: those who returned to work had significantly lower NULI scores at the post-treatment administration than those NULI scores at the post-treatment administration than those who did not return to work at that timewho did not return to work at that time

Tests theory and measure at the same timeTests theory and measure at the same time

Page 55: Development and Validation of a health status measure Susan Stock, MD MSc FRCPC Institut national de santé publique du Québec Institut national de santé

Susan Stock : Developing & Validating Health Status Measures

Responsiveness

The ability of a measure to detect change (in the The ability of a measure to detect change (in the construct being measured) over timeconstruct being measured) over time

AKA « sensitivity to change »AKA « sensitivity to change » Important when testing effectiveness of an Important when testing effectiveness of an

interventionintervention

Page 56: Development and Validation of a health status measure Susan Stock, MD MSc FRCPC Institut national de santé publique du Québec Institut national de santé

Susan Stock : Developing & Validating Health Status Measures

Statistical measures of responsiveness

Effect sizeEffect size – ability to detect the effect of treatments – ability to detect the effect of treatments Ratio of the difference between groups to the variability within Ratio of the difference between groups to the variability within

groupsgroups Numerator: raw change scoreNumerator: raw change score Denominator: Denominator:

• standard deviation of pre-test scores vs standard deviation of pre-test scores vs • SD of change scores vs SD of change scores vs • standard error of change score vs standard error of change score vs • SD of change score in stable subjectsSD of change score in stable subjects

Example: Example: Standardised response meanStandardised response mean: mean change score : mean change score SD of change scoresSD of change scores

Page 57: Development and Validation of a health status measure Susan Stock, MD MSc FRCPC Institut national de santé publique du Québec Institut national de santé

Susan Stock : Developing & Validating Health Status Measures

Responsiveness to change of NULI

Standardized response mean (SRM) Standardized response mean (SRM) calculated for 33 Ontario subjects and 35 calculated for 33 Ontario subjects and 35 Quebec subjects who both subject and Quebec subjects who both subject and clinician deemed improved on a 1-item clinician deemed improved on a 1-item global question of disabilityglobal question of disability

Page 58: Development and Validation of a health status measure Susan Stock, MD MSc FRCPC Institut national de santé publique du Québec Institut national de santé

Susan Stock : Developing & Validating Health Status Measures

Comparison of standardised response means of Revised 20-item NULI / IDVQ and

other measures Ontario

Québec

NULI (IDVQ) -20

1.48 (1.1,1.8)

1.63 (1.3,2.0)

Pain Scale 1.22 (0.9,1.6) 1.73 (1.4,2.1) Shoulder abduction -0.61 (-1.0,-0.3) -1.16 (-1.6,-0.7) Scratch test 0.02 (-0.3,0.4) 0.59 (0.2,1.0) Hand grip strength (Jamar)

-0.80 (-1.2,-0.5) -0.33 (-0.8,0.1)

SIP (total) 1.14 (0.8,1.5) - Physical component-SF36

- -1.26 (-1.6,-0.9)

Mental component –SF36

- -0.48 (-0.5,0.2)

Page 59: Development and Validation of a health status measure Susan Stock, MD MSc FRCPC Institut national de santé publique du Québec Institut national de santé

Susan Stock : Developing & Validating Health Status Measures

Existing instrument vs. designing your own

Development of a reliable, valid instrument is a Development of a reliable, valid instrument is a lengthy, complicated processlengthy, complicated process

Whenever possible, use existing instrument with Whenever possible, use existing instrument with known reliability and validityknown reliability and validity

When choosing among existing instruments, When choosing among existing instruments, choose the instrument with the best reliability, choose the instrument with the best reliability, validity and/or responsiveness that will measure validity and/or responsiveness that will measure the concept you wish to measurethe concept you wish to measure

Page 60: Development and Validation of a health status measure Susan Stock, MD MSc FRCPC Institut national de santé publique du Québec Institut national de santé

Susan Stock : Developing & Validating Health Status Measures

Choosing a health outcome measure: Internet resources

11. Quality of life compendium: choosing a quality of life instrument . Quality of life compendium: choosing a quality of life instrument (from the Dept of Public Health and Primary Health Care, (from the Dept of Public Health and Primary Health Care, University of Bergen, NorwayUniversity of Bergen, Norway)) www.www.uibuib.no/.no/isfisf/people/doc//people/doc/qolqol/comp0006./comp0006.htmhtm

2. 2. Quality of LifeQuality of Life Assessment in MedicineAssessment in Medicine - - Internet ResourcesInternet Resources http://www.http://www.qlmedqlmed.org/.org/urlurl..htmhtm

33. Clinician’s computer-assisted guide to the choice of instruments . Clinician’s computer-assisted guide to the choice of instruments for quality of life assessment in medicine for quality of life assessment in medicine http://www.http://www.glammglamm.com/.com/qlql/guide./guide.htmhtm

44. Medical Outcomes Trust Scientific Advisory Committee . Medical Outcomes Trust Scientific Advisory Committee Instrument Review CriteriaInstrument Review Criteria

http://www.outcomes-trust.org/bulletin/34sacrev.http://www.outcomes-trust.org/bulletin/34sacrev.htmhtm