Measurement issues Jean Bourbeau, MD Respiratory Epidemiology and Clinical Research Unit McGill...

Preview:

Citation preview

Measurement issues

Jean Bourbeau, MD

Respiratory Epidemiology and Clinical Research Unit

McGill University

Clinical Epidemiology (679)

June 19, 2006

Objectives Define categorical and continuous variables

Define 2 sources of variation: biological and measurement error (random and bias)

Describe the classification measures and their focus: functional, descriptive and methodological

Define and discuss the advantages and disadvantages of objective and subjective health measures

Define the psychometric properties of measurement instruments: reliability, validity, responsiveness

Discuss key questions and concerns about each of the psychometric properties of an instrument: reliability, validity and responsiveness

Define and discuss minimal clinically important difference

Reading

Fletcher, Chapter 2

Outline

of Measurement issues

1. Measurements

2. Sources of variation

3. Classification

4. Health measurements

5. Measurement properties

Outline

of Measurement issues

1. Measurements

2. Sources of variation

3. Classification

4. Health measurements

5. Measurement properties

Examples

In a 60-year-old patient after right hemicolectomy, the DUKE stage is a widely accepted, indispensable descriptive tool for planning further treatment.

Adjuvant post operative chemotherapy is currently the recommended treatment for resected Duke C colon cancer.

Examples

In a 20-year-old woman with right lower quadrant pain and vomiting, the likely diagnosis is an appendicitis or a gynecological infection.

After excluding pelvic inflammatory disease, an experienced surgeon or gastroenterologist will diagnose appendicitis based on history, clinical findings and ultrasound.

Measurement

We need to assign numbers to certain clinical phenomena to make them manageable and “scientific”

Measurement

Measure:

•A scale or test is an instrument to measure a clinical phenomenon; a score is a value on the scale in a given patient

Measurement

The attributes or events that are measured in

a research study are called « variables »

Variables are measured according to 2 types:

•Categorical

•Continuous

Categorical variables

•Also called discrete variable

•Dichotomous

or Polychotomous (multilevel):

- Nominal

- Ordinal

Dichotomous categorical variables

Examples:

•Vital status (alive vs dead)

•Yes or no (response to a question)

•Sex (male vs female)

Polychotomous categorical variables

Nominal:

•Named categories that bear no ordered relationship to one another

Example:

•Hair colour, race, or country of origin

Nominal scale

Hierarchy of mathematical adequacy:

•Lowest level (not a measurement but a classification)

•Use numbers as a labels (such as male or female)

•No inference can be drawn from the relative size of the numbers used

Ordinal:

•Named categories that bear an ordered relationship to one another

•The intervals need not be equal

Example:

•Ordinal pain scale that include « pain severity »: none, mild, moderate, and severe

•Deep tendon reflex: absent, 1+,2+, 3+, or 4+

Polychotomous categorical variables

Ordinal scale

Hierarchy of mathematical adequacy:

•Numbers are again used as a labels for response categories

•Numbers reflect the increasing order of the characteristics being measured (mild, moderate,severe)

•The numeric values, and the differences between them, hold no intrinsic meaning

Continuous variables

•Also called dimensional, quantitative or interval variables

•Expressed as integers, fractions, or decimals in which equal distances exist between successive intervals

•Examples: age, blood pressure, temperature

Interval scale

Hierarchy of mathematical adequacy:

•Numbers are assigned to the response categories in such a way that a unit change represents a constant change across the range of the scale (temperature in degrees Celsius)

Hierarchy of mathematical adequacy:

•With a ratio scale, it becomes possible to state how many times greater one score is than another

•This improves on the interval scale by including a zero point

Ratio scale

Binary

Rank order (small to large)

Continuous (0 to ∞)

Ratios

Scales

Outline

of Measurement issues

1. Measurements

2. Sources of variation

3. Classification

4. Health measurements

5. Measurement properties

Sources of variation

2 sources of variation:

•Biological variation

•Measurement error

Sources:

•Dynamic nature of most biologic entities (differences in age, sex, race, or disease status)

•Temporal variation

(sometimes predictable, such as the diurnal cycle of plasma cortisol)

Biological variation

2 different types:

•Random (chance error)

•Bias (systematic error)

Measurement error

Measurement error

Can arise from:

•The method (measuring instrument )

•Observer (the measurer)

We can talk about the variability between methods of making the measurement or between the observers

Repeated measurements by the same method or observer

• Intramethod or Intraobserver

Between two or more methods or observers

• Intermethod or Interobserver

Measurement error

Individual•Makes no difference whether the error is systematic or random

Group•Variability in the absence of bias should not change the average group value

•However, it can have deleterious consequences when one is seeking associations or correlations between 2 measures (analytic bias)

Consequences of erroneous measurement

Regression toward the mean•Individual measurement is subject to both biologic variation and measurement error

•An extremely high or low value obtained in an individual from a group is more likely to be an error than is an intermediate value

•Tendency toward a less extreme value is greater than the tendency for an intermediate value to become more extreme

Outline

of Measurement issues

1. Measurements

2. Sources of variation

3. Classification

4. Health measurements

5. Measurement properties

Classifications of measures

Functional classifications focus on:

• Purpose of application of the measures

Descriptive classifications focus on:

• Their scope

Methodological classifications focus on:

• Technical aspects

Functional classification

•Measures have discriminative, evaluative or predictive properties

•Choice of measure depends on the purpose(s) for which it will be used

Functional classification

Discriminative instrument:

Can discriminate between people with different levels of a particular attribute or disease

• For example:

•NYHA scale

•MRC dyspnea scale

MRC Dyspnea Scale

Grade 1 Breathless with strenuous exercise

Grade 2 Short of breath when hurrying on the

level or walking up a slight hill

Grade 3 Walks slower than people of the same

age on the level or stops for breath while

walking at own pace on the level

Grade 4 Stops for breath after walking 100 yards

Grade 5 Too breathless to leave the house or

breathless when dressing

none

severe

Functional classification

Predictive instrument:

•Can predict the probability of a clinical diagnosis (diagnostic test) or the likelihood of a future event (prognostic test)

...according to staging as defined by the ATS Guidelines (% predicted FEV1)

...according to the level of dyspnea as evaluated by the MRC Dyspnea Scale

Dyspnea MRC scaleFEV1

Nishimura K, et al. Chest 2002; 121: 1434-1440.

5-year survival COPD

Evaluative instrument:

Can measure change over time in the same person

•For example:

•Dyspnea subscale of the Chronic Respiratory Questionnaire (CRQ) (COPD disease-specific quality of life questionnaire)

Functional classification

Descriptive classification

•Large number of possible categories

•Can categorize instruments by:

•Content: domains of interest (dyspnea, fatigue, emotion)

•Generic or disease-specific

used in any population cross-condition comparison co-morbid conditions and

effects to treatment covered do not focus on HRQL/ COPD

irrelevant items insensitive to small changes

focus on relevant aspects of HRQL

greater sensitivity for disease changes

increased responsiveness no comparisons

General

QuestionnairesDisease-Specific

COPD

Methodological classification

•Large number of possible categories

•Can categorize by:

• Interviewer versus self-administered

•Objective versus subjective

Outline

of Measurement issues

1. Measurements

2. Sources of variation

3. Classification

4. Health measurements

5. Measurement properties

Health measurements

Measurements may be based on:

•Laboratory or diagnostic tests (objective)

• Indicators in which the patient or the clinician makes a judgement (subjective)

Health measurements

Unfortunately subjective is also used in other ways:

•To indicate if the variable is observable or not

Examples:

•Objective indicator such as « The ability to climb stairs »

•Subjective indicators such as « pain or feelings »

Objective vs Subjective

Objective:• More often continuous (lab data)

• Few categorical (vital status, sex and race)

Subjective:• Greater potential, for bias or variability on the part of

the observer

• Many variables that are most important in caring for

patients are « soft » and subjective

• For example: pain, mood, dyspnea, ability to work, HRQL

The example of CABG

Why is quality of life important in studies

of CABG patients?

•Survival with surgery > medical treatment for patients with left main and triple vessels

•Survival similar in patients with less severe disease

CASS NEJM 1984; European cooperative study Lancet 1982.

As Feinstein has emphasized

The tendency of clinical investigators to focus on “objective” rather than

“subjective” measurements can result in research that is both dehumanizing

and irrelevant

Subjective vs Objective measurement

Objective vs Subjective

Data traditionally considered objective or “hard” can be seen to have feet of softer clay

Example:

•X-ray or cytopathologic diagnoses have been shown to be subject to considerable intra- and interobserver variability

Subjective health measurements

May be grouped into 3 main categories:

• General feelings of well-being

• Symptoms of illness

• Adequacy of a person’s functioning

Subjective health measurements

Advantages:

• Amplify the data obtainable from morbidity and mortality statistics

• Give insights into matters of human concern such as pain suffering or depression

• Offer a systematic way to record the « voice of the patient »

• Do not require expensive or invasive procedures

Subjective health measurements

Disadvantages:

•Contrast sharply with the inherent reliability of mortality rates

•Seem more susceptible to bias

•Applying these measures to an entire population more difficult or impossible

Subjective health measurements

The use of rating methods suitable for statistical analysis permit subjective health measurements to rival the quantitative strengths of the traditional “objective” indicators 

Health measurements

Scientific basis:

•Subjective judgements as a valid approach to measurement derive from the field of psychophysics;

•Psychophysical principles were later incorporated into psychometrics from which most of the techniques used to develop subjective measurements of health have been derived

Outline

of Measurement issues

1. Measurements

2. Source of variation

3. Classification

4. Health measurements

5. Measurement properties

Psychometric properties

Definition:

•Psychometrics is the science of using standardized tests or scales to measure attributes of a person or object

Numerical estimates of health

Many scaling methods exist for:

•Translating « indicators » into numerical estimates of severity

•When it is done, they may be combined into an overall score, termed « health index »

Criteria for a scoring system:

•Reliability

•Validity

•Responsiveness

•Minimal clinically important difference (MCID)

Psychometric properties

Definition:

•The extent to which the same results are obtained when the measurement is repeated

It may reflect either (temporal) variation or random measurement error

Reliability

Reliability

Key Questions:

•Internal consistency

•Test-retest reliability (reproducibility)

Key Concern:

•Error

(error attenuates relationships between variables, and makes it more difficult to detect treatment effects)

Validity

Definition:

•The extent to which the measurement corresponds to the « true » value (some accepted « gold standard »), or behaves as expected

Validity depends on minimizing measurement error caused by bias

Type of measurement validity

Content validity

Construct validity (convergent, discriminant)

Criterion validity (predictive, concurrent)

Cross-cultural validity

“Situational” validity

Content validity

Definition:

•The extent to which the items sampled for inclusion in the instrument adequately represent the domain of content (particular domain area) addressed by the instrument

Content validityKey Questions:

•Theoretical foundation of the instrument

• Instrument development: primary sources of information, sources of items and scaling structure selection

•Rules applied for content validation: patient and/or clinician validation; scientific review

• Instrument is appropriate for the study under consideration

Content validity

Key concern:

•Without validity, an instrument has no meaning

Definition:

•The extent to which the instrument measures an abstract concept (construct) or attribute; evaluated by comparison with instruments measuring related constructs

•Convergent (come together, same concept) or discriminant with other instruments (truly measures something different from other instruments)

Construct validity

Definition:

•Extent to which the instrument relates to an external criterion (criterion of practical value)

•Concurrent (able to correlate with a present criterion) or predictive (able to correlate with a future criterion)

Criterion validity

Construct validity

It is important to understand that a direct test of the validity of an abstract

concept such as impaired health due to disease is not possible

Construct validityKey Questions:

•Factor structure of the measure consistent with expectations

•Scores from the instrument correlate with those of other instruments (measuring the same or related constructs)

•Score from the instrument independent of scores from instruments measuring dissimilar constructs

•Differentiate groups known to differ on the attribute being measured, e.g. on HRQL

Testing construct validity

•The most widely method used is the multitrait-multimethod matrix

•It involves testing a series of hypotheses concerning relationships between the new instrument and a range of reference measures of disease activity

Construct validity

Key concern:

•Without validity, an instrument has no meaning

Definition:

•The extent to which an instrument developed and tested in one cultural group is appropriate for, and behaves similarly in, another

Cross-cultural validity

Cross-cultural validityKey Questions:

•Items appropriate for the culture under consideration

•Instrument translated culturally and linguistically

•Evidence of reliability and validity

Definition:

•The extent to which an instrument is appropriate for use in any given situation

“Situational” validity

“Situational” validityKey Questions:

•Instrument should measure an appropriate outcome for the trial

•Instrument should be valid for the specific purpose of the trial

•Sufficiently reliable and responsive for this purpose

•Sample size sufficient to detect change in the outcome measure of interest

“Situational” validityKey Issues:

•Validity can be situation specific; an instrument valid for one situation is not necessarily valid for another

•Failure to detect treatment effects may be a function of study design, rather than a limitation of the instrument

Definition:

The extent to which scores change with a given change in the condition or disease state

Key Questions:• Instrument has been evaluated for responsiveness

• Effects sizes have been associated with the instrument in well designed trials.

Key concerns:• The ability to track changes

Responsiveness

MCID

Definition:

The smallest difference that clinicians and patients would care about

Key Questions:• Has the MCID been established?

• What was the method used?

Key concerns:• The ability to detect true treatment effects

Benefits of Pulmonary Rehabilitation

Functional exercise capacity6-MWD (N=444)

Health statusCRQ dyspnea (N=519)

Lacasse Y, et al. Cochrane Database Syst Rev 2002; 3:CD003793.

Key messagesSome simple criteria:•The system must address a well defined clinical phenomenon

•The scale has to have a clearly defined ranking in a hierarchical order (reasonable clinical or mathematical criteria)

•The different stages or categories have to be mutually exclusive

•The scale has to be adapted to the area of measurement where it will be applied

•Creating complex or composite scores such as quality of life requires one to address issues concerning the inner structure of a score

Key messagesQuote from McDowell and Newell:•Ultimately the selection of a measurement contains an element of art and perhaps even luck; it is often prudent to apply more than one measurement whenever possible.

•This has the advantage of reinforcing the conclusions of the study when the results from ostensibly similar methods are in agreement, and it also serves to increase our general understanding of the comparability of the measurements we use.

Recommended