120
Duke Robert J. Margolis, MD Center for Health Policy Clinical Outcome Assessments: Establishing and Interpreting Meaningful Within-Patient Change 4/4/17

Clinical Outcome Assessments: Establishing and ... Outcome Assessments: Establishing and Interpreting Meaningful Within-Patient Change 4/4/17 Exploring the Use of Emerging Methods

Embed Size (px)

Citation preview

  • Duke Robert J. Margolis, MD Center for Health Policy

    Clinical Outcome Assessments: Establishing and Interpreting

    Meaningful Within-Patient Change 4/4/17

  • CLINICAL OUTCOME ASSESSMENTS: INTERPRETING MEANINGFUL CHANGE

    Duke Margolis Expert Workshop April 4, 2017

    Elektra J. Papadopoulos, MD, MPH Clinical Outcome Assessments Staff

    Office of New Drugs

    Center for Drug Evaluation and Research

    U.S. Food and Drug Administration

    www.fda.gov 1

    http:www.fda.gov

  • Role of Patient Perspective

    Dr. Janet Woodcock:

    "It turns out that what is really bothering the patient and what is really bothering the doctor can be radically different things

  • Framing of FDA Drug Benefit-Risk Assessment

    Decision Factor Evidence and Uncertainties I Conclusions and Reasons & .... -=-- -- Sets the context for the weighing of benefits and risks: l fiITTi n ffili1

    ________ ..... How serious is this indicated condition} and why?

    ... -- - -~1i;;lllf;.f;U1-1T:lllli!' How wel l is the patient populat ion}s medical! need be ing met by

    current ly ava ilable therapies?

    Characterize and assess the evidence of benefit:

    Benefit How mean ingful is t he benefit} and for whom? How compelling is the expected benefit in the post-market sett ing?

    Characterize and assess the safety concerns:

    Risk How serious are t he safety signa lls identified in the submitted data? What potential risks could emerge in the post-market setting?

    Assess what risk management (e.g., labeling, REMS) may be Risk Management necessary to address the identified safety concerns

    Benefit-Risk Summary and Assessment

    3

  • FD!s Patient-Focused Drug Development

    (PFDD) Initiative

    Patients are uniquely positioned to inform understanding of the therapeutic context for drug development and evaluation There is a need for more systematic ways of gathering patient

    perspective on their condition and treatment options

    Patient-Focused Drug Development (PFDD) is part of FDA commitments under PDUFA V* FDA is convening 24 meetings on specific disease areas in FY 2013-

    17 Meetings can help advance a systematic approach to gathering

    input

    *The fifth authorization of the Prescription Drug User Fee Act, enacted in 2012

    4 4

  • PFDD in Chronic Disease

    PFDD meetings routinely ask for patients perspectives on what an ideal treatment would look like and what clinical benefit would be the most meaningful to them Concepts such as emotional impact of disease,

    ability to perform activities are often cited by patients as important E;g;, in Parkinsons disease patients want to know

    functional status over time Depending on the stage of disease even small amounts

    of deterioration can make the difference between being able to perform basic activities (e.g., feeding oneself) independently or not

    5

  • PFDD Next Steps

    Advance science of

    patient input

    Engage wider community to discuss methodologically sound approaches that:

    Bridge from initial PFDD meetings to more systematic collection of patients input

    Generate meaningful input on patients experiences and perspectives to inform drug development and B-R assessment

    !re fit for purpose in drug development and regulatory context

    Provide guidance

    To: patient communities, researchers, and drug developers

    On: pragmatic and methodologically sound strategies, pathways, and methods to gather and use patient input

    6 6

  • Interpretation of

    Clinically Meaningful

    Statistical significance alone is not sufficient

    Clinical benefit: a positive clinically meaningful effect of an intervention, i.e., a positive effect on how an individual feels, functions, or survives.

    To establish clinical benefit we consider two questions: 1) Does the assessment measure or reflect something of

    significance to patients? Relies on patient, caregiver and expert input/engagement

    2) Is the magnitude of change at the individual level sufficiently large to affect how patients feel or function in daily life? 7

  • Triangulation of Evidence

    Multiple methods used to select a benchmark for meaningful change

    Often result in a range of values for what is a

    clinically meaningful benchmark

    Triangulation of evidence consists of examining these values to converge on an appropriate value or range of values likely to represent meaningful change in the outcome of interest

    8

  • Guidance for Industry Patient-Reported Outcome Measures: Use in Medical Product Development

    to Support Labeling Claims

    li.S. Depar1men1 of Healt h and Jiu.ma n Sfr.,.kes Food :md Drug Adrninistm1 lon

    Ct'nter for Drug E, :1l uatlon :md Resl".:t rd1 (CDER) Centf.'r for Hiologics fi:, ,a luation and Res('a lTh (CHER) c ... nter for IX',kes and Radiologka l Health (CDRJ [)

    December 2009 Clinical/Medical

    Good Measurement Principles

    http://www.fda.gov/downlo ads/Drugs/GuidanceComplia nceRegulatoryInformation/G uidances/UCM205269.pdf

    FDA PRO Guidance defines good measurement principles to consider for well-defined and reliable (21 CFR 314.126) PRO measures

    All COAs can benefit from the good measurement principles described within the guidance

    But, judgment and flexibility are needed!

    www.fda.gov 9

    http:www.fda.govhttp://www.fda.gov/downlo

  • Final PRO Guidance (2009)

    Clinically meaningful thresholds may vary by target population: we will evaluate an instruments responder definition in the

    context of each specific clinical trial.

    Anchor-based methods emphasized: Empiric evidence for any responder definition is derived using

    anchor-based methods /explore the associations between the targeted concept of

    the PRO instrument and the concept measured by the anchors

    Multiple anchors recommended

    Distribution-based methods: /should be considered as supportive and are not appropriate

    as the sole basis for determining a responder definition 10

  • Final PRO Guidance (2009) Emphasizes the display of individual

    responses to treatment: /it is possible to present the entire distribution

    of responses for treatment and control group, avoiding the need to pick a responder criterion. Whether the individual responses are meaningful represents a judgment/

    /cumulative distribution displays show a continuous plot of the percent change from baseline on the X-axis and the percent of patients experiencing that change on the Y-axis.

    A variety of responder definitions can be identified along the cumulative distribution of response curve. 11

  • ut

    The presentation of all possible response level cut-off points does not eliminate the need to identify the level of change that is clinically important (or at least to state our uncertainty about that level)

    12

  • Cumulative Distribution Function (CDF) (DB4, pooled across treatment arms)

    100

    -a, 90 .., ta 0:: i.. 80 a, -0 C: 0 C.

    70 IA a,

    60 0:: -a, 0J) so ta .., C: a,

    40 u i.. a, Q.

    30 a, -~ ni 20 :s E

    10 :s u

    0

    -6

    - Much Bett er (n=298)

    - Somewhat Bet ter (n=288)

    - Not Changed (n=185)

    -5 -4 -3 -2 -1 0 1

    Change from Baseline in Nocturia Episodes

    Improvement

    2

    Source: Dr. Jia Guo; Bone, Reproductive and Urologic Drugs 13Advisory Committee 10/19/2016

  • CDF Plot by Treatment Arms (D84) 100

    90 - SER 120 1.5 mcg (n=260) 80

    CJJ 70 +" - Placebo (n=260)

    n, a: 60 ... CJJ

    "'C so C: 0 40 Q. V)

    CJJ 30 a: 36%

    20

    10

    0

    -6 -4 2

    Change from Baseline in Nocturia Episodes

    Improvement

    Source: Dr. Jia Guo; Bone, Reproductive and Urologic Drugs 14Advisory Committee 10/19/2016

  • Establishing Meaningful Change:

    Examples from FDA Guidance

    !lzheimers Disease: Developing Drugs for the Treatment of Early Stage Disease (2013) Co-primary endpoint of cognitive test and a functional or global

    assessment The intent of this dual measurement is to ensure the clinical

    meaningfulness of a cognitive benefit that may be observed

    Irritable Bowel Syndrome: Clinical Evaluation of Drugs for Treatment (2012) Patient global assessments

    Example: How would you rate your IS signs or symptoms overall over the past 7 days?

    Analgesic Indications: Developing Drugs and Biological Products (2014) Allows the use of a responder analysis (e.g., 30% reduction in pain with

    early discontinuation counted as failure) in addition to differences in group means

    Encourages use of cumulative distribution functions in the package insert 15

  • A word about MID and MID

    Minimum important difference (MID) was removed from the Final PRO Guidance 2009 Confusion resulted from the term being used

    interchangeably to indicate either group-level mean

    differences as well as individual level change Use of the term minimal is problematic: While a minimal amount of change may be noticeable, it does not

    necessarily imply the change is meaningful to patients

    Minimal clinically important difference (MID)- i.e., the smallest difference in score

  • Beyond Anchor-based and

    Distribution-based methods:

    Examples of Emerging Methods Bookmarking/Standard Setting Patients and experts are presented with clinical vignettes of a disease in

    order to reach a consensus on thresholds for severity levels Designed for measures that have been calibrated using an IRT model

    Scale judgment Panels of judges evaluate pairs of completed tests to determine whether

    the amount of change specified by the responses before and after treatment is meaningful

    Exit interviews Interviews of patients who recently completed a clinical trial an be used to collect qualitative and quantitative data about patients

    experience of disease or treatment burden and changes during the course of the clinical trial

    Others 17

  • Todays Goals

    Advance the discussion on methods to identify meaningful within-patient change in COAs by discussing key issues and major challenges, including: What are the advantages and disadvantages of each of the

    methods?

    How might threshold determinations differ across the four types of COAs?

    What are special considerations for establishing meaningful change in small and heterogeneous study populations?

    How and when could these methods be most feasibly used in drug development?

    18

  • U.S. FOOD & DRUG

    ADMINISTRATION

  • Duke Robert J. Margolis, MD Center for Health Policy

    Clinical Outcome Assessments: Establishing and Interpreting

    Meaningful Within-Patient Change 4/4/17

  • Exploring the Use of Emerging Methods to Derive and Interpret Meaningful Within-Patient Change Using Idio

    Scale-Judgment (Bookmarking/Standard-Setting)

    Karon F. Cook

    Department of Medical Social Sciences, Feinberg School of Medicine, Northwestern University,

    Chicago, IL

    April 4, 2017

    Washington, DC

  • \

    Background H

    ow

    mu

    ch w

    e kn

    ow

    Building State of the Art Measures

    How to Interpret Scores on State of the Art Measures

  • Meaningful Change

  • Methods for Defining Meaningful Score

    Differences

    Statistical

    Global Ratings of Change

    External Anchors

  • often

    always

    sometimes

    How often are you too tired to

    never

    x

    Pro

    babili

    ty o

    f specifie

    d r

    esponse

    0.0

    1.0

    x x

    Fatigue

    socialize with family?

  • 11 I I I I I I I I

    I I I I I I I I I

    Never Rarely Sometimes

    Never Rarely Sometimes Always

    40 50 60 30 70

  • Item Response Samples T=40 never was too tired to do household chores.

    never needed to sleep during the day.

    rarely had trouble finishing things because

    she was too tired.

    rarely was so tired that she needed to rest

    during the day.

    rarely felt that she had no energy.

    FATIGUE

    30 40 50 60 70

  • 00000000 00000 0 0 0 0

    I I I I I I I 11 I I I I I I 11 I I I I I I I I I I I I I I I 11 I I I I I I 1 11 I I I I I I I I I I I I I I I ii I I I I I 1 11 I I I I I

    Item Response Samples

    sometimes was too tired to eat.

    often had trouble finishing things because she was too tired.

    often was too tired to do her household chores.

    often needed to sleep during the day.

    T=62

    always frustrated by being too tired to do the things she wanted to do.

    30 40 50 60 70

  • National Multiple Sclerosis Society

    Grant #H00145 Deborah Miller, PI

    10

  • 36

    Online panel of 500 participants with Multiple Sclerosis

    Responded to NeuroQoL Fatigue Short Form

    Developed 18, 5-item sample response sets, 2 pts apart.

    Ms. Butler

    e.g. Branched into 7 fatigue levels 48-51

    Presented with 7 response samples

    70

  • MY FATIGUE

    OTHER PERSON'S FATIGUE

    fCREEn /HOT/

    In PART B, you will

    Look at the fatigue reports of 7 people who have MS

    Compare each persons fatigue to your own fatigue. For example, your fatigue might be greater.

    Or, you might decide your fatigue is the SAME or LESS than that other persons.

  • Woul~ Maff er

    /CREEn /HOT/

    If you decide your fatigue is DIFFERENT from the other persons, you will then

    Consider what it would be like to have this persons fatigue, and

    Decide if the difference would matter to you in your daily life.

    13

  • /HOT/

    Depending on your own fatigue, you may decide that none, some, or all of these people have more, less, or

    the same amount fatigue.

    There are no right answers just your own thoughtful judgments.

  • [T Score = 58]

    This is what Ms. Anderson said about her fatigue over the last 7 days. She reported that she:

    sometimes felt weak all over.

    often had to limit social activity because she was tired.

    sometimes had trouble starting things because she was too tired.

    often was too tired to take a short walk.

    often had trouble finishing things because she was too tired.

    Compared to Ms. !ndersons , has YOUR FATIGUE been:

    Greater than Ms. !ndersons

    The same as Ms. !ndersons

    Less than Ms. !ndersons

  • You said YOUR FATIGUE over the past week was Greater

    If your fatigue IMPROVED to Ms. Andersons level, would it make a difference in your daily life?

    It wouldnt really make a difference in my daily life.

    It would make a difference in my daily life (things I

    do day-to-day would be easier).

    Less

    G

    reat

    er

    MY FATIGUE

    MS. ANDERSON'S FATIGUE

    This is what Ms. Anderson said about his fatigue over the last 7 days. She reported that she:

    sometimes felt weak all over.

    often or always had to limit social activity because she was tired.

    sometimes had trouble starting things because she was too tired.

    often was too tired to take a short walk.

    often had trouble finishing things because she was too tired.

  • If your fatigue WORSENED to Ms. !ndersons level, You said YOUR FATIGUE over the past week was would it make a difference in your daily life? LESS than MS. !NDERSONS FATIGUE. It wouldnt really make a difference in my daily

    life.

    It would make a difference in my daily life (many

    Less

    G

    reat

    er

    MY FATIGUE

    MS. !NDERSONS FATIGUE

    of the things I do day-to-day would be harder).

    This is what Ms. Anderson said about his fatigue over the last 7 days. She reported that she:

    sometimes felt weak all over.

    often or always had to limit social activity because she was tired.

    sometimes had trouble starting things because she was too tired.

    often was too tired to take a short walk.

    often had trouble finishing things because she was too tired.

  • 1111 I 1111 I 1111 I 1111I111111111 I 1111 I 1111 I 1111I111111111I111111111 I 1 ~ t ~ t t ~ 1' L_JI IL_JI II IL_JI I

    5 2 5 4 5 6 5 8 6 0 6 2 6 4

    Butler Richardson Woods Anderson Foster Allen Harris

    T = 58.8 Score Group 56-60

    7 points >

    5 points >

    3 points >

    1 pt

    5 points <

    3 points <

    1 pt*

    *

    *

    *

  • ANALYSIS

    Results

  • Analyses to Estimate Thresholds for

    Interpreting Change

    Calculate minimum distance endorsed by respondent as meaningful improvement/decrement

    Identify thresholds that would capture different percentages of respondents minimums.

  • .0

    14.0

    13.0

    12.0

    11 .0

    "ti 10.0

    0 ~ Ill 9 .0 QI ... ~ 8.0 .... jij

    7.0 ::::, "ti

    > 6.0 :s C 5.0

    4.0

    3.0

    2.0

    1 .0

    .0

    Thresholds for Worsening Mean of Individual Thresholds @

    0 0

    0

    90th Percentile 0 & 'C)" o ................. . 75 th Percentile O O o ............................. Q .Q .. ~ 'O" G . ............... .

    0 0 0 0 O O 0

    -O E> 0 0 0 0

    0 0

    501h Percentile

    0 0 0 0 0 0

    0 0 0 0

    0 0 0

    20 30 40 50 60 70 80

    Neuro-Qol Fatigue T-Score

    Threshold locations for capturing 50, 75, and

    95% of distances endorsed as important

    worsening

  • 0

    -1 .0

    -2.0

    -3.0

    -4.0

    -5.0 -g

    0 -6.0 .l: VI -7.0 QI ... .l: -8.0 .... ;; -9.0 ::::, -g

    -10.0 > -g

    -11 .0 C

    -12.0

    -13.0

    -1 4.0

    -15.0

    -16.0

    -17.0

    0 0 0 0 0 0

    0 0 O O 0 O O 0

    0

    0 0

    0

    50th Percentile= -3.3 0 o O O O 0 - - - - - - . - - - - - - - - - - - - - - o- o-o 0 f)_ -cs - - - - -

    0 0 O O 0 75th Percentile= -5.1 o O 0 - - - - - - - - - - - - - - o- c- - - o- - - - - - - - -

    0 0 O 0 0 0 0 0 0

    90th Percentile= -7.3 o .. - - - ..... - - - .... - - - ... -~ - - ... 0- . - o- .. ~- - -e5 ..... - - - .... - - - ..

    20

    Thresholds for Improvement Mean of Individual Thresholds ~

    30 40 50 60

    Neuro-Qol Fatigue T-Score

    0

    0

    0

    70 80

    Threshold locations for capturing 50, 75, and

    95% of distances endorsed as important

    improvement

  • 11111111111

    1111 111111111 111111111 111il1111 111~11il111il1111l111il111il1

    5 2 5 4 5 6 5 8 6 0 6 2 6 4

    7 points >

    5 points >

    3 points >

    1 pt

    Richardson Woods Anderson Butler

    *

    *

    *

    *

    8% Reversals92% Current Judgments Consistent with Prior Judgment

  • 1111 I 1111 I 1111 I 1111I111111111 I 1111 1111 I 1111I111111111I111111111 I 1

    ALe

    ss

    G

    reat

    er

    MY FATIGUE

    MS. !NDERSONS F TIGUE

    5 2 5 4 5 6 5 8 6 0 6 2 6 4

    Anderson

    Inconsistent Judgment

    7.6% of all judgments were inconsistent with the one

    prior (of 3000 opportunities)

    344

    94

    51

    11

    0

    50

    100

    150

    200

    250

    300

    350

    400

    0 1 2 3

    FREQUENCY OF INCONSITENT JUDGMENTS

    *

  • Qual Life Re (2017) 26:847 57 DOI 10. 1007/sl 11 6-016-1-114-

    When global rating of change contradicts observed change: Examining appraisal processes underlying paradoxical responses over time

    Carolyn E. Schwartz 1.2 Victoria E. Powell 1 Bruce D. Rapkin3

    Accepted: 1-1 September 2016/PublL,hcd onliae: October 20 16 lfl prin e.- International Publlmin,, c, .. ' ~-' - ""'

    CrossMarlc

    Scores of N=525

    Declined

    Mental health component scores over time in MS

    Unchanged

    On GROC, 48.6% made a paradoxical judgment

    reporting worse status when observed score was unchanged or

    endorsing the same status when observed scores had declined.

  • '

    :

    We were able to estimate plausible responder thresholds for consequential change.

    Participants evaluated range of IRT-Vs close to their own fatigue levels.

    Participants reported high confidence that their judgments (3.3 between moderately and highly confident).

  • '

    :

    Judgments contextualized in a patient-relevant contextmake a difference in daily life.

    Design allowed large samples.

    Judgment errors existed, but were within range of other methods.

    Judgement errors not strongly associated with demographics (e.g., education)

  • How can we do better

    Qualitative research to understand what is important to people in assessing change.

    Cognitive debriefing to understand what people are attending to. Are they attending to different things. Could we selection of concepts of the vignettes, should they be standardized.

    Frame vignettes by what is important to people.

    Try to understand the variation in levels of change that people believe is important.

    Can set a threshold that is most representative, but it will not catch everyone.

    Change study design so that everyone is getting same distances. Branch on every score.

  • How can we do better?

  • I ? I

    I ~ I

    I.I ,

    l.

  • An Alternative Model:

  • Duke Robert J. Margolis, MD Center for Health Policy

    Clinical Outcome Assessments: Establishing and Interpreting

    Meaningful Within-Patient Change 4/4/17

  • Some Remarks about

    (Educational) Standard Setting, Characterizing Meaningful Change, and the Scale-Judgment Method

    David Thissen L.L.Thurstone Psychometric Laboratory

    The University of North Carolina at Chapel Hill

  • Minimally Important Difference (MID) estimation is not like answering the question

    What is the ratio of the circumference of a circle

    to its diameter?

  • Minimally Important Difference (MID) estimation is not like answering the question

    What is the ratio of the circumference of a circle

    to its diameter?

  • Minimally Important Difference (MID) estimation is like

    At least for regulatory purposes, MID is like a speed limit: A policy decision informed by data

    (And likely between 0.2 and 0.5 standard units)

  • Most authorities on standard setting (e.g., Green,Trimble, and Lewis, 2003; Hambleton, 1980; Jaeger, 1989; Shepard, 1980; Zieky, 2001) suggest that, when setting cut scores, it is prudent to use and compare results from different standard setting methods. (p. 155)

    Green, D.R.,Trimble, C.S., and Lewis, D.M. (2003). Interpreting the results of three different standard-setting procedures. Educational Measurement: Issues and Practice, 22, 22-32. Hambleton, R.K. (1980).Test score validity and standard setting methods. In D.C. Berliner, (Ed.), Criterion-referenced measurement:The state of the art (pp. 80-123). Baltimore, MD: Johns Hopkins University Press. Jaeger, R.M. (1989). Certification of student competence. In R.L. Linn (Ed.), Educational measurement (3rd ed., pp. 485-514).Washington DC:American Council on Education. Shepard, L.A. (1980). Standard setting issues and methods. Applied Psychological Measurement, 4, 447-467. Zieky, M.J. (2001). So much has changed: How the setting of cutscores has evolved since the 1980s. In G.J. Cizek (Ed.), Setting performance standards: Concepts, methods, and perspectives. Mahwah, NJ: Lawrence Erlbaum.

  • Section II: Standard Setting Methods Chapter 4.The Nedelsky Method Chapter 5.The Ebel Method Chapter 6.The Angoff Method

    and Angoff Variations Chapter 7.The Direct Consensus Method Chapter 8.The Contrasting Groups

    and Borderline Group Methods Chapter 9.The Body of Work

    and Other Holistic Methods The Body of Work Method The Judgmental Policy Capturing Method The Dominant Profile Method The Analytic Judgment Method

    Chapter 10.The Bookmark Method Chapter 11.The Item-Descriptor

    Matching Method Chapter 12.The Hofstee and Beuk Methods

  • National Assessment of Educational Progress (NAEP) Achievement Levels

    Grade 4 Mathematics, 1996

    Below Basic Basic Proficient Advanced

    100 150 200 250 300 350 NAEP Scale Score

  • A Ranking Procedure to Find Score Ranges Associated with

    Mild,Moderate, and Severe Conditions

    (Cella et al., 2008, 2014)

    Cella, D., Choi, S., Rosenbloom, S., Surges,Tatum, D., Garcia, S., Lai, J.-S., George, J., & Gershon, R. (2008). A novel IRT-Based

    case-ranking approach to derive expert standards for symptom severity (paper presentation), International Society for Quality

    of Life Research Annual Scientific Meeting. Montevideo, Uruguay. Cella, D., Choi, S., Garcia, S., Cook, K. F., Rosenbloom, S., Lai, J. S.,Tatum, D. S., & Gershon, R. (2014). Setting

    standards for severity of common symptoms in oncology using the PROMIS item banks and expert judgment. Quality of Life Research, 23, 2651-2661.

  • - -r ~

    ~ ~

    - ---r "'1111 ~

    ~

    - ---r "'1111 ~

    ~ -- ---,.. " ~ t

    '-- ~ ~ ~ ~ ,~

    Sample Fatigue VignetteT-score 40

    Rating ______

    FATIGUE PINK 1 How often did your fatigue make you

    feel less alert? Never Rarely Some-

    times Often Always

    2 How often did you have trouble Never Rarely Some- Often Always starting things because of your times fatigue?

    3 How often did you feel run-down? Never Rarely Some-times

    Often Always

    4 How often were you energetic? Never Rarely Some-times

    Often Always

    5 How easily did you find yourself getting tired on average?

    Not at all A little bit Some-what Quite a bit

    Very much

    Graphic from: Cella, D., Choi, S., Rosenbloom, S., Surges,Tatum, D., Garcia, S., Lai, J.-S., George, J., & Gershon, R. (2008). A novel IRT-Based case-ranking approach to derive expert standards for symptom severity (paper presentation), International Society for Quality of Life Research Annual Scientific Meeting. Montevideo, Uruguay.

  • Sample Depression VignetteT-score 60

    Rating ______

    DEPRESSION - MINT 1 I felt that I had nothing to look

    forward to Never Rarely Some-

    times Often Always

    2 I felt that I wanted to give up on everything

    Never Rarely Some-times

    Often Always

    3 I felt disappointed in myself Never Rarely Some-times

    Often Always

    4 I felt lonely Never Rarely Some-times

    Often Always

    5 I felt I had no reason for living Never Rarely Some-times

    Often Always

    Graphic from: Cella, D., Choi, S., Rosenbloom, S., Surges,Tatum, D., Garcia, S., Lai, J.-S., George, J., & Gershon, R. (2008). A novel IRT-Based case-ranking approach to derive expert standards for symptom severity (paper presentation), International Society for Quality of Life Research Annual Scientific Meeting. Montevideo, Uruguay.

  • I

    I

    ANXIETY - Case Examples Exercise

    Step 1: Please rev ievv the ten different cards in the "Anxiety" envelope. Each ca rd represents a patient \1vho fa I ls along a different place on the anxiety continuum. Sort the cards in order from least severe to most severe, giving each color a ranking ("1" being least severe).

    Please enter the card color (e.g. "Pink", "Blue", etc.) below the number ranking that you have assigned it. You are encouraged to give each card a unique ranking, but this is not required. If you believe tvvo patients are tied, for example, at rank "6", then vvrite both color names under the number "6. "

    Least severe Most severe - -

    1 2 3 4 5 6 7 8 9 10 -

    COLOR:

    COLOR: (if applicable)

    COLOR: I (if applicable)

    Step 2: Novv please d ra\1V three ve rtica I Ii nes bet\,veen ran ks (e.g. betvveen \\3" and \\4"); one delineating each of the follovving:

    1. A separation between those cards (i.e. patients) that you believe represent a normal level of anxiety and a mild level of anxiety

    2. A separation between those cards representing a n1ild level of anxiety and a moderate level of anxiety 3. A separation between those cards representing a moderate level of anxiety and a severe level of anxiety

    D

    I

    Expert Ranking Sheet:Anxiety, Step 1

    Graphic from: Cella, D., Choi, S., Rosenbloom, S., Surges,Tatum, D., Garcia, S., Lai, J.-S., George, J., & Gershon, R. (2008). A novel IRT-Based case-ranking approach to derive expert standards for symptom severity (paper presentation), International Society for Quality of Life Research Annual Scientific Meeting. Montevideo, Uruguay.

  • I I

    I

    ANXIETY - Case Examples Exercise

    Step 1: Please review the ten different cards in the "Anxiety" envelope. Each ca rd represents a patient who falls along a different place on the anxiety continuum. Sort the cards in order from least severe to most severe, giving each color a ranking ("1" being least severe).

    Please enter the card color (e.g. 'Pink", "Blue", etc.) below the number ranking that you have assigned it. You are encouraged to give each ca rd a unique ranking, but th is is not required. If you believe tvvo patients are tied, for exam pie, at rank ' 6", th en vvrite both color names under the number "6. "

    Least severe ; Most severe - -

    1

    COLOR:

    COLOR: (if applicable)

    COLOR: I (if applicable)

    2 - -

    -..

    .

    -

    t

    3 4 - - - , - - 5 6 7 I 8 9 10 I - - _,, - - I I I I I I - I I " , I

    ! I I I I I I

    , I I I I

    I I I I I I I I I -

    Step 2: No\1V please d ra vv three ve rtica I Ii nes bet\,veen ran ks (e.g. bet\,veen "3" and "4"); one delineating each of the follo\1Ving:

    1. A separation betvveen those cards (i.e. patients) that you believe represent a normal level of anxiety and a mild level of anxiety

    2. A separation between those cards representing a mild level of anxiety and a moderate level of anxiety 3. A separation between those cards representing a moderate level of anxiety and a severe level of anxiety

    D

    Expert Ranking Sheet:Anxiety, Step 2

    Normal Mild Moderate Severe

    Graphic from: Cella, D., Choi, S., Rosenbloom, S., Surges,Tatum, D., Garcia, S., Lai, J.-S., George, J., & Gershon, R. (2008). A novel IRT-Based case-ranking approach to derive expert standards for symptom severity (paper presentation), International Society for Quality of Life Research Annual Scientific Meeting. Montevideo, Uruguay.

  • "U a "'C 0 ~ 0 ::::,

    0 . 0 01

    0 . 0 ~

    0 . 0 w

    0 . 0 I\.)

    0 . 0 ....Ji.

    52.5

    Anxiety

    62.5 0 .___----.-----------.----------------.......-------------____, . 0

    30 40 50 60

    T-Score

    70 80

    After a consensus-

    building process

    Normal Mild Moderate Severe

    Graphic from: Cella, D., Choi, S., Rosenbloom, S., Surges,Tatum, D., Garcia, S., Lai, J.-S., George, J., & Gershon, R. (2008). A novel IRT-Based case-ranking approach to derive expert standards for symptom severity (paper presentation), International Society for Quality of Life Research Annual Scientific Meeting. Montevideo, Uruguay.

  • A Bookmarked-Vignettes Procedure to Find Score Ranges Associated with

    Mild,Moderate, and Severe Conditions

    (Cook et al., 2014; Morgan et al., 2017)

    Cook, K. F.,Victorson, D. E., Cella, D., Schalet, B. D., & Miller, C. (2014). Creating meaningful cut-scores for Neuro-QOL measures of fatigue, physical functioning and sleep disturbance using standard setting with patients and providers. Quality of Life Research, 24, 575-589.

  • Judges do not know this vignette is for a T-score of 47.5

    Ms. Millers Fatigue

    In the last 7 days, Ms. Miller rarely felt weak all over and rarely was so tired she couldnt take a short walk. However, she sometimes felt tired, which got in the way of her doing her household chores. Feeling too tired to do the things she wanted to do was sometimes frustrating for her.

    In summary, Ms. Miller reports being:

    Rarely weak all over. Rarely too tired to take a short walk. Sometimes tired. Sometimes too tired to do household chores. Sometimes frustrated by being too tired to do the things she

    wanted to do.

    Cook, K. F.,Victorson, D. E., Cella, D., Schalet, B. D., & Miller, C. (2014). Creating meaningful cut-scores for Neuro-QOL measures of fatigue, physical functioning and sleep disturbance using standard setting with patients and providers. Quality of Life Research, 24, 575-589.

  • alertwhenhewokeupandreadytostarttheday.Inthelast7days, hasneverhadtroublesleepingbecauseofbaddreamsandneverhada hard9mecontrollinghisemo9onsbecauseofpoorsleep.

    Insummary,Mr.Turner

    Neverhavingtroublesleepingbecauseofbaddreams. Neverhavingahard9mecontrollinghisemo9onsbecauseof

    Rarelyfeeling9red. O,enfeelingalertwhenhewokeup. O,enwakingupandfeelingreadytostarttheday.

    Mr.Turnerhasrarelyfelt9red.O,en, hehasfelt alertwhenhewokeupandreadytostarttheday.Inthelast7days, hasneverhadtroublesleepingbecauseofbaddreamsandneverhada hard9mecontrollinghisemo9onsbecauseofpoorsleep.

    Insummary,Mr.Turner

    Neverhavingtroublesleepingbecauseofbaddreams. Neverhavingahard9mecontrollinghisemo9onsbecauseof

    Rarelyfeeling9red. O,enfeelingalertwhenhewokeup. O,enwakingupandfeelingreadytostarttheday.

    Mr.Turnerhasrarelyfelt9red.O,en, hehasfelt alertwhenhewokeupandreadytostarttheday.Inthelast7days, hasneverhadtroublesleepingbecauseofbaddreamsandneverhada hard9mecontrollinghisemo9onsbecauseofpoorsleep.

    Insummary,Mr.Turner

    Neverhavingtroublesleepingbecauseofbaddreams. Neverhavingahard9mecontrollinghisemo9onsbecauseof

    Rarelyfeeling9red. O,enfeelingalertwhenhewokeup. O,enwakingupandfeelingreadytostarttheday.

    Mr.Turnerhasrarelyfelt9red.O,en, hehasfelt alertwhenhewokeupandreadytostarttheday.Inthelast7days, hasneverhadtroublesleepingbecauseofbaddreamsandneverhada hard9mecontrollinghisemo9onsbecauseofpoorsleep.

    Insummary,Mr.Turner

    Neverhavingtroublesleepingbecauseofbaddreams. Neverhavingahard9mecontrollinghisemo9onsbecauseof

    Rarelyfeeling9red. O,enfeelingalertwhenhewokeup. O,enwakingupandfeelingreadytostarttheday.

    Mr.Turnerhasrarelyfelt9red.O,en, hehasfelt alertwhenhewokeupandreadytostarttheday.Inthelast7days, hasneverhadtroublesleepingbecauseofbaddreamsandneverhada hard9mecontrollinghisemo9onsbecauseofpoorsleep.

    Insummary,Mr.Turner

    Neverhavingtroublesleepingbecauseofbaddreams. Neverhavingahard9mecontrollinghisemo9onsbecauseof

    Rarelyfeeling9red. O,enfeelingalertwhenhewokeup. O,enwakingupandfeelingreadytostarttheday.

    Mr.Turnerhasrarelyfelt9red.O,en, hehasfelt alertwhenhewokeupandreadytostarttheday.Inthelast7days, hasneverhadtroublesleepingbecauseofbaddreamsandneverhada hard9mecontrollinghisemo9onsbecauseofpoorsleep.

    Insummary,Mr.Turner

    Neverhavingtroublesleepingbecauseofbaddreams. Neverhavingahard9mecontrollinghisemo9onsbecauseof

    Rarelyfeeling9red. O,enfeelingalertwhenhewokeup. O,enwakingupandfeelingreadytostarttheday.

    Mr.Turnerhasrarelyfelt9red.O,en, hehasfelt alertwhenhewokeupandreadytostarttheday.Inthelast7days, hasneverhadtroublesleepingbecauseofbaddreamsandneverhada hard9mecontrollinghisemo9onsbecauseofpoorsleep.

    Insummary,Mr.Turner

    Neverhavingtroublesleepingbecauseofbaddreams. Neverhavingahard9mecontrollinghisemo9onsbecauseof

    Rarelyfeeling9red. O,enfeelingalertwhenhewokeup. O,enwakingupandfeelingreadytostarttheday.

    Mr.Turnerhasrarelyfelt9red.O,en, hehasfelt alertwhenhewokeupandreadytostarttheday.Inthelast7days, hasneverhadtroublesleepingbecauseofbaddreamsandneverhada hard9mecontrollinghisemo9onsbecauseofpoorsleep.

    Insummary,Mr.Turner

    Neverhavingtroublesleepingbecauseofbaddreams. Neverhavingahard9mecontrollinghisemo9onsbecauseof

    Rarelyfeeling9red. O,enfeelingalertwhenhewokeup. O,enwakingupandfeelingreadytostarttheday.

    t

    I t

    I

    t

    he

    p.

    he

    p.

    he

    p.

    he

    p.

    he

    p.

    he

    p.

    he

    p.

    he

    p.

    NoProblems MildProblems

    AnnasPain

    Inthelast7days, Mr.Turnerhasrarelyfelt9red.O,en, hehasfelt

    JuliasPain

    Inthelast7days, reported: AndreasPain

    poor slee Inthelast7days,

    MildProblems ModerateProblems

    reported: JacobsPain

    Inthelast7days,poor slee

    reported: ChloesPain

    Inthelast7days,poor slee reported: KristensPain

    Inthelast7days,poor slee reported: MayasPain

    Severe Problems ModerateProblems

    Inthelast7days,poor slee reported: ClairesPain

    Inthelast7days,poor slee reported: AddisonsPain

    Inthelast7days, Addisonhasrarelyfelt9red.O,en, hehasfeltalert poor slee whenhewokeupandreadytostarttheday.Inthelast7days, hehas reported: neverhadtroublesleepingbecauseofbaddreamsandneverhada hard9mecontrollinghisemo9onsbecauseofpoorsleep.

    poor slee Insummary,Mr.Turner reported: Neverhavingtroublesleepingbecauseofbaddreams. Neverhavingahard9mecontrollinghisemo9onsbecauseof

    poor sleep. Rarelyfeeling9red. O,enfeelingalertwhenhewokeup. O,enwakingupandfeelingreadytostarttheday.

    Graphic from: DeWitt, E.M. (2015, February 6). Establishing clinical meaning and defining important differences for PROMIS measures in Juvenile Idiopathic Arthritis. Presentation at UNC PROMIS Pediatric Investigators Meeting, Chapel Hill, NC.

  • ..D 0

    5 7 8

    Graphic from: Morgan, E.M., Mara, C.A., Huang, B., Barnett, K., Carle,A.C., Farrell, J.E., & Cook, K.F. (2017). Establishing Clinical Meaning and Defining Important Differences for Patient-Reported Outcomes Measurement Information System (PROMIS) Measures in Juvenile Idiopathic Arthritis Using Standard Setting with Patients, Parents, and Providers. Quality of Life Research, 26, 565-586.

  • The Scale-Judgment Method to Estimate the Minimally Important Difference (MID)

    between Scores

    (Thissen et al., 2016)

    Thissen, D., Liu,Y., Magnus, B., Quinn, H., Gipson D.S., Dampier, C., Huang, I-C., Hinds, P.S., Reeve, B.B., Gross, H.E., & DeWalt, D.A. (2016). Estimating minimally important difference (MID) in PROMIS pediatric measures using the scale-judgment method. Quality of Life Research, 25, 13-23.

  • A minimally important difference (MID) has been defined as the smallest difference in score that patients perceive as important, and which would lead the clinician to consider a change in the patients management

    Guyatt et al. (2002)

    Existing methods:

    Distribution-based indices (not an empirical method; merely expresses change in standard units)

    Anchor-based methods (contrasting groups in educational standard setting)

    Guyatt, G. H., Osoba, D.,Wu,A.W.,Wyrwich, K.W., & Norman, G. R. (2002). Methods to explain the clinical significance of

    health status measures. Mayo Clinic Proceedings, 77, 371-383. Revicki, D., Hays, R. D., Cella, D., & Sloan, J. (2008). Recommended methods for determining responsiveness and minimally

    important differences for patient-reported outcomes. Journal of Clinical Epidemiology, 6, 102-109.

  • Earlier judgment-based methods Delphi Method, Delphi plus anchor, physician survey, expert panels using visual analog scales or changes to item responses Bellamy, N.,Anastassiades,T. P., Buchanan,W.W., Davis, P., Lee, P., McCain, G.A.,Wells, G.A., & Campbell, J. (1991). Rheumatoid arthritis antirheumatic drug trials. III. Setting the delta for clinical trials of antirheumatic drugs--results of a

    consensus development (Delphi) exercise. Journal of Rheumatology, 18, 1908-1915. Bellamy, N., Buchanan,W.W., Esdaile, J. M., Fam,A. G., Kean,W. F.,Thompson, J. M.,Wells, G.A., & Campbell, J. (1991). Ankylosing spondylitis antirheumatic drug trials. III. Setting the delta for clinical trials of antirheumatic drugs--results of a

    consensus development (Delphi) exercise. Journal of Rheumatology, 18, 1716-1722. Bellamy, N., Carette, S., Ford, P. M., Kean,W. F., le Riche, N. G., Lussier,A.,Wells, G.A., & Campbell, J. (1992). Osteoarthritis

    antirheumatic drug trials. III. Setting the delta for clinical trials--results of a consensus development (Delphi) exercise. Journal of Rheumatology, 19, 451-457. Spiegel, B. M.,Younossi, Z. M., Hays, R. D., Revicki, D., Robbins, S., & Kanwal, F. (2005). Impact of hepatitis C on health

    related quality of life: a systematic review and quantitative assessment. Hepatology, 41, 790-800. Wyrwich, K.W., Metz, S. M., Kroenke, K.,Tierney,W. M., Babu,A. N., & Wolinsky, F. D. (2007).Triangulating patient and

    clinician perspectives on clinically important differences in health-related quality of life among patients with heart disease. Health Services Research, 42(6 Pt 1), 2257-2274; discussion 2294-2323. Wells, G., Li,T., Maxwell, L., MacLean, R., & Tugwell, P. (2007). Determining the minimal clinically important differences in

    activity, fatigue, and sleep quality in patients with rheumatoid arthritis. Journal of Rheumatology, 34, 280-289. Rai, S. K.,Yazdany, J., Fortin, P. R., & Avina-Zubieta, J.A. (2015).Approaches for estimating minimal clinically important

    differences in systemic lupus erythematosus. Arthritis Research and Therapy, 17, 143. van Walraven, C., Mahon, J. L., Moher, D., Bohm, C., & Laupacis,A. (1999). Surveying physicians to determine the minimal

    important difference: implications for sample-size calculation. Journal of Clinical Epidemiology, 52, 717-723. Todd, K. H., & Funk, J. P. (1996).The minimum clinically important difference in physician-assigned visual analog pain scores. Academic Emergency Medicine, 3, 142-146. Dempster, H., Porepa, M.,Young, N., & Feldman, B. M. (2001).The clinical meaning of functional outcome scores in children

    with juvenile arthritis. Arthritis and Rheumatology, 44, 1768-1774. Gong, G.W.,Young, N. L., Dempster, H., Porepa, M., & Feldman, B. M. (2007).The Quality of My Life questionnaire: the

    minimal clinically important difference for pediatric rheumatology patients. Journal of Rheumatology, 34, 581-587.

  • One month ago Today

    I felt alone.

    never almost never sometimes

    I felt like I couldnt do anything right.

    never almost never sometimes

    I felt everything in my life went wrong.

    never almost never

    I felt sad.

    never almost never

    I thought that my life was bad.

    never almost never

    I could not stop feeling sad.

    never almost never

    I felt lonely.

    never almost never

    I felt unhappy.

    never almost never

    sometimes

    sometimes

    sometimes

    sometimes

    sometimes

    sometimes

    often almost always

    often almost always

    often almost always

    often almost always

    often almost always

    often almost always

    often almost always

    often almost always

    T-score 62.1

    I felt alone.

    never almost never sometimes

    I felt like I couldnt do anything right.

    never almost never sometimes

    I felt everything in my life went wrong.

    never almost never sometimes

    I felt sad.

    never almost never

    I thought that my life was bad.

    never almost never

    sometimes

    sometimes

    I could not stop feeling sad.

    never almost never sometimes

    I felt lonely.

    never almost never

    I felt unhappy.

    never almost never

    sometimes

    sometimes

    often almost always

    often almost always

    often almost always

    often almost always

    often almost always

    often almost always

    often almost always

    often almost always

    T-score 58.9

    The scale-judgment method presents judges with pairs of questionnaires, artificially completed using IRT, with scores known to the experimenter but not the judges Thissen, D., Liu,Y., Magnus, B., Quinn, H., Gipson D.S., Dampier, C., Huang, I-C., Hinds, P.S., Reeve, B.B., Gross, H.E., & DeWalt, D.A. (2016). Estimating minimally important difference (MID) in PROMIS pediatric measures using the scale-judgment method. Quality of Life Research, 25, 13-23.

  • One month ago Today

    I felt alone.

    never almost never sometimes

    I felt like I couldnt do anything right.

    never almost never sometimes

    I felt everything in my life went wrong.

    never almost never

    I felt sad.

    never almost never

    I thought that my life was bad.

    never almost never

    I could not stop feeling sad.

    never almost never

    I felt lonely.

    never almost never

    I felt unhappy.

    never almost never

    sometimes

    sometimes

    sometimes

    sometimes

    sometimes

    sometimes

    often almost always

    often almost always

    often almost always

    often almost always

    often almost always

    often almost always

    often almost always

    often almost always

    T-score 62.1

    I felt alone.

    never almost never sometimes

    I felt like I couldnt do anything right.

    never almost never sometimes

    I felt everything in my life went wrong.

    never almost never sometimes

    I felt sad.

    never almost never

    I thought that my life was bad.

    never almost never

    sometimes

    sometimes

    I could not stop feeling sad.

    never almost never sometimes

    I felt lonely.

    never almost never

    I felt unhappy.

    never almost never

    sometimes

    sometimes

    often almost always

    often almost always

    often almost always

    often almost always

    often almost always

    often almost always

    often almost always

    often almost always

    T-score 58.9

    The judges (clinicians, adolescents, parents) judge for each pair whether the (imaginary) respondent is doing or feeling better, worse, or about the same. Thissen, D., Liu,Y., Magnus, B., Quinn, H., Gipson D.S., Dampier, C., Huang, I-C., Hinds, P.S., Reeve, B.B., Gross, H.E., & DeWalt, D.A. (2016). Estimating minimally important difference (MID) in PROMIS pediatric measures using the scale-judgment method. Quality of Life Research, 25, 13-23.

  • For the Depressive Symptoms example, this process yields data with summary statistics like these

    Scale Score Frequency Proportion 1 month No Wrong

    Pair ago Today Difference Better difference Worse Direction

    2 49.5 57.9 8.4 23 19 185 0.10

    3 56.7 62.1 5.4 32 18 176 0.14

    1 43.5 45.9 2.4 15 151 61 0.07

    5 64.3 62.1 -2.2 133 66 27 0.12

    4 62.1 58.9 -3.2 179 33 15 0.07

    6 73.4 66.0 -7.4 189 21 17 0.07

    There were more data for Fatigue, Mobility, and Pain. Thissen, D., Liu,Y., Magnus, B., Quinn, H., Gipson D.S., Dampier, C., Huang, I-C., Hinds, P.S., Reeve, B.B., Gross, H.E., & DeWalt, D.A. (2016). Estimating minimally important difference (MID) in PROMIS pediatric measures using the scale-judgment method. Quality of Life Research, 25, 13-23.

  • If the judges were homogeneous, data analysis could be logistic regression of the probability different on the scale score difference, with the 50-50 point the MID:

    0.0

    0.5

    1.0

    P("different")

    0 1 2 3 4 5 6 7 8 Scale Score Difference

    But the judges were not homogeneous.

    Thissen, D., Liu,Y., Magnus, B., Quinn, H., Gipson D.S., Dampier, C., Huang, I-C., Hinds, P.S., Reeve, B.B., Gross, H.E., & DeWalt, D.A. (2016). Estimating minimally important difference (MID) in PROMIS pediatric measures using the scale-judgment method. Quality of Life Research, 25, 13-23.

  • So we treated the pairs of questionnaires as items, the same-different judgments as item responses, and fitted the data with the 1PL IRT model:

    0.0

    0.5

    1.0

    P("different")

    5.73.6 2.4 1.15.6 2.3

    -3 -2 -1 0 1 2 3 (Propensity to respond "different")

    Thissen, D., Liu,Y., Magnus, B., Quinn, H., Gipson D.S., Dampier, C., Huang, I-C., Hinds, P.S., Reeve, B.B., Gross, H.E., & DeWalt, D.A. (2016). Estimating minimally important difference (MID) in PROMIS pediatric measures using the scale-judgment method. Quality of Life Research, 25, 13-23.

  • Then we interpolated the scale-score difference for a hypothetical item that would be judged different 50% of the time by an average respondent:

    0.0

    0.5

    1.0

    P("different")

    5.73.6 2.4 1.15.6 2.3

    -3 -2 -1 0 1 2 3 (Propensity to respond "different")

    Thissen, D., Liu,Y., Magnus, B., Quinn, H., Gipson D.S., Dampier, C., Huang, I-C., Hinds, P.S., Reeve, B.B., Gross, H.E., & DeWalt, D.A. (2016). Estimating minimally important difference (MID) in PROMIS pediatric measures using the scale-judgment method. Quality of Life Research, 25, 13-23.

  • XO XO xt:::,. X I:::,. 0 0 0 0

    D,. D,.

    + + X X

    x+ + +

    + X 0 I:::,.

    I:::,. _____________ _Q_ ___ _ +o

    X X

    We used quadratic regression to interpolate the scale-score difference associated with a pair of questionnaires that would have a 1PL b of zero:

    Wrong Direction Omitted Wrong Direction Reversed

    2 4

    6 8

    Sca

    le S

    core

    Diff

    eren

    ce

    Dep. Symp. Fatigue Mobility Pain

    -3 -2 -1 0 1

    2 4

    6 8

    Sca

    le S

    core

    Diff

    eren

    ce

    Dep. Symp. Fatigue Mobility Pain

    -3 -2 -1 0 1

    b b

    Thissen, D., Liu,Y., Magnus, B., Quinn, H., Gipson D.S., Dampier, C., Huang, I-C., Hinds, P.S., Reeve, B.B., Gross, H.E., & DeWalt, D.A. (2016). Estimating minimally important difference (MID) in PROMIS pediatric measures using the scale-judgment method. Quality of Life Research, 25, 13-23.

  • Wrong Direction Wrong Direction Omitted Omitted

    MID s.e. MID s.e.

    Clinicians 2.1 0.6 1.9 0.6

    Adolescents 2.2 0.6 2.1 0.6

    Parents 2.4 0.7 2.2 0.7

    MID is about two points on the T-score scale for these health outcomes measures, with no clear difference among the domains.

    Thissen, D., Liu,Y., Magnus, B., Quinn, H., Gipson D.S., Dampier, C., Huang, I-C., Hinds, P.S., Reeve, B.B., Gross, H.E., & DeWalt, D.A. (2016). Estimating minimally important difference (MID) in PROMIS pediatric measures using the scale-judgment method. Quality of Life Research, 25, 13-23.

  • A Free-Response Method to Estimate the Minimally Important Difference (MID)

    between Scores

    (Morgan et al., 2017)

    Morgan, E.M., Mara, C.A., Huang, B., Barnett, K., Carle,A.C., Farrell, J.E., & Cook, K.F. (2017). Establishing Clinical Meaning and Defining Important Differences for Patient-Reported Outcomes Measurement Information System (PROMIS) Measures in Juvenile Idiopathic Arthritis Using Standard Setting with Patients, Parents, and Providers. Quality of Life Research, 26, 565-586.

  • DeWitt, Cook, and their colleagues also used something like the scaled-judgment method, but with the judges filling out the responses to the after protocol to make it minimally different from the (given) pre protocol.

    This can be conceptualized as a free response variant of the scaled-judgment method.

    Morgan, E.M., Mara, C.A., Huang, B., Barnett, K., Carle,A.C., Farrell, J.E., & Cook, K.F. (2017). Establishing Clinical Meaning and Defining Important Differences for Patient-Reported Outcomes Measurement Information System (PROMIS) Measures in Juvenile Idiopathic Arthritis Using Standard Setting with Patients, Parents, and Providers. Quality of Life Research, 26, 565-586.

  • Morgan, E.M., Mara, C.A., Huang, B., Barnett, K., Carle,A.C., Farrell, J.E., & Cook, K.F. (2017). Establishing Clinical Meaning and Defining Important Differences for Patient-Reported Outcomes Measurement Information System (PROMIS) Measures in Juvenile Idiopathic Arthritis Using Standard Setting with Patients, Parents, and Providers. Quality of Life Research, 26, 565-586.

  • Future Research

    Do these different methods of data collection yield consistent results? Or are there predictable differences?

    How do results from these methods compare to results obtained with anchor-based methods, when anchors are available?

    Everyone finds differences between groups of judges... adolescents, parents, clinicians; what is to be made of that?

  • THE UNIVERSITY

    of NORTH CAROLINA at CHAPEL HILL

    Acknowledgments

    This work was funded by the National Institutes of Health through the NIH Roadmap for Medical Research, Grant 1U01AR052181-01.

    Thanks to Dave Cella, Karon Cook, and Esi Morgan for their graphics used in this presentation, and my collaborators Yang Liu, Brooke Magnus, Hally Quinn, Debbie S. Gipson, Carlton Dampier, I-Chan Huang, Pamela S. Hinds, Bryce B. Reeve, Heather E. Gross, and Darren A. DeWalt at UNC and across the rest of the PROMIS pediatric multi-site project.

  • Duke Robert J. Margolis, MD Center for Health Policy

    Clinical Outcome Assessments: Establishing and Interpreting

    Meaningful Within-Patient Change 4/4/17

  • Clinical Trial Exit Interviews

    Presented at the Clinical Outcome Assessments: Establishing and Interpreting Meaningful Within-Patient Change Meeting

    The Duke-Margolis Center for Health Policy, Washington, DC, April 4, 2017

    Dana DiBenedetti, PhD Executive Director, Patient-Centered Outcomes Assessment

  • RTI (h)(s J Health Solutions

    Ti1e power of Th e va lu e o f

    knowledge. understanding.

    Acknowledgments

    T. Michelle Brown Carla (DeMuro) Romano Lynda Doward Claire Ervin Sheri Fehnel Sandy Lewis Diane Whalley

    2

  • RTI (h)(s J Health Solutions

    Ti1e power of Th e va lu e o f

    knowledge. understanding.

    What is an Exit Interview?

    The collection of (mostly) qualitative data from clinical trial participants Most commonly, interviews are conducted soon after participants complete

    the treatment period However, patients (and/or caregivers) experiences and perspectives

    regarding treatment benefit may not be fully captured with traditional COAs.

    Interviews with clinical trial participants provide the opportunity to more fully explore the impacts of investigational products Describe the meaningfulness of treatment-related changes (positive and

    negative) Identify unanticipated treatment benefits

    Information regarding pre-study experiences, as well as treatment-related expectations and unmet needs can also be collected.

    3

  • RTI (h)(s J Health Solutions

    Ti1e power of Th e va lu e o f

    knowledge. understanding.

    Why Do Exit Interviews?

    To identify Characteristics of (sometimes new or rare) patient populations What symptoms/impacts are most important to patients

    Allows participants to articulate concepts that may be important to them but that are not obtained (or fully obtained) in the trial, thus

    Enriching researchers and sponsors understanding of the patient experience Aiding in interpretation of other clinical data

    Full impact of treatment (meaningful changes) Unmet needs of treatment Expectations for and experiences with disease and of treatment Thematic information used to inform future COA strategies and

    clinical trial designs Potential treatment differentiators

    4

  • RTI (h)(s J Health Solutions

    Ti1e power of Th e va lu e o f

    knowledge. understanding.

    Exit Interviews

    Supplement, support, and facilitate the interpretation of data from traditional PRO, PerfO, ObsRO and/or clinical measures Provide greater depth and rationale for data from traditional measures Describe treatment effects Explore the relevance and clinical meaningfulness of specific treatment

    changes beyond clinical indices and side effects Explain anomalous results

    5

  • RTI (h)(s} Health Solutions

    knowledge. understanding.

    The power of The value of

    Sample Interview Concepts

    Patients (and Caregivers) Experiences With and Attitudes About Treatment Symptoms/impact prior to study start Expectations of changes/outcomes

    Can compare pre-study expectations with clinical outcomes Anticipated or unanticipated benefits, impact of those benefits

    Impact of treatment on daily life/functioning Impact of treatment on most important/bothersome symptoms Onset of benefits/changes

    Treatment experiences Convenience of visits, monitoring Managing treatment schedule (e.g., regimen schedule, infusions, monitoring) Most challenging aspect of study treatment Managing adverse events

    How well treatment addresses most important/bothersome symptoms Impact of treatment on daily life/functioning, quality of life Satisfaction levels with treatment

    Reasons for satisfaction

    6

  • RTI (h)(s J Health Solutions

    Ti1e power of Th e va lu e o f

    knowledge. understanding.

    Potential Applications

    When to conduct interviews

    Both within and outside the context of a clinical trial Implementing as part of a clinical trial is generally more efficient and

    maximizes participation as compared with a separate or subsequent study

    At various time points (not just at the end of a study) Baseline, at key time point(s) during the study, at the end of a randomized

    treatment phase, at the end of open-label extension, etc.

    With all participants or select samples of study participants Participants can be selected by site, country, experience of a particular side

    effect, patient-reported data

    7

  • RTI (h)(s J Health Solutions

    Ti1e power of Th e va lu e o f

    knowledge. understanding.

    Approaches to Conducting Patient Interviews

    Approach 1: Experienced, trained qualitative researchers conduct interviews Interviews conducted via telephone or

    in-person at designated time(s) Can be prospectively planned into the

    CT protocol or done as a substudy Interviews follow a semi-structured guide Values of this approach

    Richest source of data, robust methodologically

    Level of granularity from experienced interviewers

    Limits the variability in data quality (vs large number of individuals with varying degrees of qualitative experience)

    Qualitative analysis usually done byinterviewers themselves

    Approach 2: Study coordinators (SCs) conduct interviews Qualitative interviewers would develop

    interview guide/related materials, and provide training to SCs

    Certify, demonstrate proficiency Use a more standardized and heavily

    scripted interview guide SCs provide field notes, audio recording

    etc. to qualitative researchers who analyze qualitative results

    Values of this approach Although data may be less in-depth

    than Approach 1 Particularly effective in global trials in

    which interview process needs to be scaled to allow for maximal participation

    Allows for interview to be conducted by a someone familiar to patient

    8

  • RTI (h)(s J Health Solutions

    Ti1e power of Th e va lu e o f

    knowledge. understanding.

    Issues to Consider in Operationalizing

    What questions are you trying to answer with the interviews? Exploratory, looking for a signal vs providing data/support for

    primary endpoint? Do you need patients from all countries to answer your questions

    or sample of participants?

    Population Sample size Who is going to conduct interviews? Method Timelines Budget Senior-management buy in

    9

  • RTI (h)(s J Health Solutions

    Ti1e power of Th e va lu e o f

    knowledge. understanding.

    Potential Methodological Considerations / Limitations How, if at all, exit interview activities influence CT data Self-selection bias of exit interview volunteers (site and patient

    level) Sample

    All patients, subsample(s), size

    How data will be analyzed How interview data relate to CT data

    Potential for additional adverse event reporting

    10

  • RTI (h)(s J Health Solutions

    Ti1e power of Th e va lu e o f

    knowledge. understanding.

    Factors Contributing to a More Successful Interview Study General rule of thumb: the more sites and patients, the easier and

    less expensive it is to recruit Include prospectively in clinical trial (vs. relying on sites and

    patients to volunteer their participation) Increases site and patient willingness and compliance Increases patient sample size Interview substudy can be included as a component of a clinical trial for select

    countries (does not have to be for the entire study) Additional protocol amendments and IRB reviews would not be needed Does not significantly add to site burden Training for interview substudy adds ~ 30 minutes to site initiation visits

    11

  • RTI (h)(s J Health Solutions

    Ti1e power of Th e va lu e o f

    knowledge. understanding.

    Factors Contributing to a More Successful Interview Study Adequate time to design interview substudy and materials Target an adequate sample size (e.g., 30-50 interviewed patients)

    More likely to identify themes/signals (vs. 10-15 patients)

    Larger site and patient pool increases likelihood of success Easier and more efficient to recruit More buy-in from sites and patients

    Include in phase 1B or phase 2 study Increases chances of early identification of signals (e.g., treatment benefits,

    impacts) Learn what is important to patients that may not be included in protocols Early signals can help inform future study design, PRO measurement

    strategy, selection of other study endpoints, systematic measurement of new endpoints

    12

  • Health Solutions

    Ti1e power of Th e va lu e o f

    knowledge. understanding.

    Exit Interview Study Examples

    13

  • RTI (h)(s J Health Solutions

    Ti1e power of Th e va lu e o f

    knowledge. understanding.

    Example 1: Exit Interviews with COPD and Asthma

    Patients in Prospective, Real World Clinical Studies RTI-HS designed and is implementing an exploratory study to capture

    patient-centered information in the context of two real-world studies being conducted in chronic obstructive pulmonary disease (COPD) and asthma.

    The study is investigating the impact and management of COPD and asthma from the patients perspective and highlighting the potentialrelationship between treatment and both behavioral and psychologicalfactors on patients experiences. Goal is to identify key risk factors for exacerbations and treatment adherence.

    A mixed methods approach is being used: Quantitative data is being collected through the administration of structured,

    closed-ended questions administered to all patients via telephone interviews. Qualitative data is also being collected through semi-structured, open-ended

    questions on key topic areas administered to a subset of patients via face to face interviews.

    14

  • RTI (h)(s J Health Solutions

    Ti1e power of Th e va lu e o f

    knowledge. understanding.

    Example 2: Interviews with Patients with Diabetic Gastroparesis Before and After Treatment RTI-HS recently collaborated with a pharmaceutical client developing a new

    treatment for diabetic gastroparesis (DG) Participation in qualitative interviews at both the beginning (pre-treatment) and

    end (post-treatment) of a phase 2 study was offered to all clinical trial participants Primary objective of the pre-treatment interviews was to inform the development

    of a new PRO measure or modification of an existing PRO measure by: Identifying a comprehensive set of DG symptoms Learning how patients describe the burden and natural variation in these symptoms Understanding the relative bothersomeness of the symptoms Describing expectations related to successful treatment

    Primary objective of the post-treatment interviews was to gather in-depth information about participants experience with the study drug, including the magnitude and relative importance of both positive and negative changes

    A manuscript describing the methods and results of this study have just been submitted for publication

    15

  • RTI (h)(s J Health Solutions

    Ti1e power of Th e va lu e o f

    knowledge. understanding.

    Example 3: Exit Interviews with Clinical Trial

    Participants with Carcinoid Syndrome (CS) Task: Regulatory requirement that client assess and document the relevance and

    clinical meaningfulness of specific CS-related symptoms and their impacts Designed and implemented a qualitative study to explore perceptions and

    experiences of patients following their participation in a clinical trial. Conducted telephone exit interviews with 35 patients across 16 sites in 5 countries

    enrolled in a phase 3 clinical trial investigating a new treatment for carcinoid syndrome to assess:

    Participants experiences (symptoms and impacts) with their disease Perceived benefits of the study treatment The clinical meaningfulness of specific symptom improvements and their

    associated impact to the patients

    Mixed methods (qualitative and quantitative data) Data analyzed

    Qualitative Quantitative Compared with selected clinical trial data

    16

  • RTI (h)(s J Health Solutions

    Ti1e power of Th e va lu e o f

    knowledge. understanding.

    Example 3: Exit Interviews with Clinical Trial Participants with Carcinoid Syndrome (CS): Results Supported the primary endpoint of decrease in diarrhea The 3 most important symptoms to treat and the most bothersome

    symptoms were diarrhea, BM frequency, and urgency. BM frequency was reported as being more important to treat than stool

    form/consistency. Meaningfulness of changes with treatment 95% of participants who reported reductions in BM frequency noted that

    this was meaningful to them, allowing them to better enjoy life, leave the house, and participate in social and other activities.

    I definitely feel like I'm not a prisoner in my house, staying 10 feet to the nearest bathroom. I can go out to activities

    But the biggest change is not having to run to the toilet constantlyYou can't live going 20 times a day. I was able to go out more often

    Most participants reported that a BM frequency reduction of at least 30%would be considered meaningful.

    17

  • Duke Robert J. Margolis, MD Center for Health Policy

    Clinical Outcome Assessments: Establishing and Interpreting

    Meaningful Within-Patient Change 4/4/17

  • -

    Exploring the Use of

    Anchor-Based Methods

    to Derive and Interpret

    Meaningful Within-

    Patient Change

    April 4, 2017

    Gwaltney Consulting Confidential

  • -

    ANCHOR-BASED METHODS

    The anchor-based approaches use an external indicator, either clinical or patient-based, to assign subjects into several groupings reflecting no change, small positive changes, large positive changes, small negative changes, or large negative changes in clinical or health status (Revicki 2008; p. 104)

    [Anchor-based methods] anchor change scores on the COA to an external criterion that identifies study subjects who have experienced an important change in their condition (PRO Consortium 2015)

    Meaningful within-person change = Change on the target COA measure for patients who experience meaningful improvement or worsening on the anchor

    Gold standard for estimating meaningful within-person change (FDA 2009)

    Gwaltney Consulting Confidential

  • -

    TYPES OF ANCHORS

    Global Impression Change Patient, Caregiver, Clinician Reported

    Global Impression of Symptoms Patient Reported

    Disease Severity Categories e.g., New York Heart Association Classification among

    heart failure patients

    Occurrence of a Meaningful Event e.g., Hospitalization, disease relapse

    Experience of certain degree of change on a disease-related variable

    e.g., Loss of 5% body fat in obese patients (Crosby 2003)

    Gwaltney Consulting Confidential

  • -

    EXAMPLE: PGIC

    Please choose the response below that best describes the

    overall change in your

    since you started taking the study medication.

    Very much Better

    Moderately Better

    A Little Better

    No Change

    A Little Worse

    Moderately Worse

    Very much Worse

    Gwaltney Consulting Confidential

  • -

    Dist~ibu~on of Change Scores by PGIIC C.at,egory

    6 -Q,J .5

    4 "ffi rJJ m m .,

    2 - .5 0 c.. 0 -0

    C w iii -G) -2 ~

    0 0 Cl)

    -4 C m

    CL C 6 Q) C) C: -8 ra

    .c u

    -1,0 ---"""'""l!"- --,------......-----------n=-=""~---'

    ESTIMATING WITHIN-PATIENT MEANINGFUL CHANGE

    Farrar 2001; Pain

    Gwaltney Consulting Confidential

  • -

    TYPES OF ANCHORS

    Global Impression Change Patient, Caregiver, Clinician Reported

    Global Impression of Symptoms Patient Reported

    Disease Severity Categories e.g., New York Heart Association Classification among

    heart failure patients

    Occurrence of a Meaningful Event e.g., Hospitalization, disease relapse

    Experience of certain degree of change on a disease-related variable

    e.g., Loss of 5% body fat in obese patients (Crosby 2003)

    Gwaltney Consulting Confidential

  • -

    EXAMPLE: PGIS

    Please choose the response below that best describes the

    severity of your over the

    past week.

    None

    Mild

    Moderate

    Severe

    Very Severe

    Gwaltney Consulting Confidential

  • -

    TYPES OF ANCHORS

    Global Impression Change Patient, Caregiver, Clinician Reported

    Global Impression of Symptoms Patient Reported

    Disease Severity Categories e.g., New York Heart Association Classification among

    heart failure patients

    Occurrence of a Meaningful Event e.g., Hospitalization, disease relapse

    Experience of certain degree of change on a disease-related variable

    e.g., Loss of 5% body fat in obese patients (Crosby 2003)

    Gwaltney Consulting Confidential

  • -

    CONSIDERATIONS WHEN SELECTING

    ANCHORS

    Anchors should be easier to interpret than the PRO measure itself (FDA, 2009)

    Correlation between anchor and target COA should be greater than 0.30-0.40 (Hays 2005; Revicki 2008)

    Should anchor assess change in a specific symptom/function or a more global assessment of

    health?

    Recall bias with impression of change items

    Most appropriate anchor type for different types of COAs?

    Recommended to use multiple independent anchors and to examine and confirm responsiveness across

    multiple samples (Revicki 2008)

    Gwaltney Consulting Confidential

    http:0.30-0.40

  • -

    CONSIDERATIONS WHEN USING

    ANCHOR-BASED METHODS

    Type of analysis to determine meaningful change? Descriptive: Average COA score at each level of PGIC

    Formal: Regression analysis, ROC curve

    What level of change should be considered as the marker for meaningful change? Minimal?

    Moderate? Large?

    Only use estimate from group that has changed? Difference between changed and stable groups?

    Non-linear relationship between anchor and COA score

    Gwaltney Consulting Confidential

  • -

    Dist~ibu~on of Change Scores by PGIIC C.at,egory

    6 -Q,J .5

    4 "ffi rJJ m m .,

    2 - .5 0 c.. 0 -0

    C w iii -G) -2 ~

    0 0 Cl)

    -4 C m

    CL C 6 Q) C) C: -8 ra

    .c u

    -1,0 ---"""'""l!"- --,------......-----------n=-=""~---'

    NON-LINEAR RELATIONSHIP BETWEEN PGIC AND PRO

    Farrar 2001; Pain

    Gwaltney Consulting Confidential

  • -

    Dist~ibu~on of Change Scores by PGIIC C.at,egory

    6 -Q,J .5

    4 "ffi rJJ m m .,

    2 - .5 0 c.. 0 -0

    C w iii -G) -2 ~

    0 0 Cl)

    -4 C m

    CL C 6 Q) C) C: -8 ra

    .c u

    -1,0 ---"""'""l!"- --,------......-----------n=-=""~---'

    NON-LINEAR RELATIONSHIP BETWEEN PGIC AND PRO

    Farrar 2001; Pain

    Gwaltney Consulting Confidential

  • -

    0 0 ....

    0

    =-'-~-1i--------+r 1: t,~ ~~ ...................... . I

    ' I

    I 0 0

    0

    $ 0

    0

    g 0

    0 0

    0

    v. much improved much improved mi1nlm. improved no change mi1nim. worse much worse v. much worse answer to OG.I question

    IG. 1. Boxplot: ofabsolute change from baseline to week 12 in the number of moderate to _ever,e hot: flushe _ by answer to COi question. COi,. Clinical loba Impression; v.} very; minim.~ minimally; ab_.} absolute.

    NON-LINEAR RELATIONSHIP BETWEEN PGIC AND PRO

    Gerlinger 2012; Menopause Gwaltney Consulting Confidential

  • -

    CONSIDERATIONS WHEN USING

    ANCHOR-BASED METHODS

    Use of cross-sectional approaches? e.g., Difference between disease severity categories at

    single point in time

    Effect of unblinding on PGIC rating?

    False sense of precision Clinical trials are less likely to acknowledge the error

    associated with estimates

    Different anchors can lead to substantially different findings How integrate findings?

    Gwaltney Consulting Confidential

  • -

    REFERENCES

    Crosby RD, Kolotkin RL, Williams GR. Defining clinically meaningful change in health-related quality of life. J Clin Epidemiol. 2003 May;56(5):395-407.

    Farrar JT, Young JP Jr, LaMoreaux L, Werth JL, Poole RM. Clinical importance of changes in chronic pain intensity measured on an 11point numerical pain rating scale. Pain. 2001 Nov;94(2):149-58.

    Gerlinger C, Gude K, Hiemeyer F, Schmelter T, Schfers M. An empirically validated responder definition for the reduction of moderate to severe hot flushes in postmenopausal women. Menopause. 2012 Jul;19(7):799-803

    Hays RD, Farivar SS, Liu H. Approaches and recommendations for estimating minimally important differences for health-related quality of life measures. COPD. 2005 Mar;2(1):63-7.

    PRO Consortium 2015. Interpreting Change in Scores on COA Endpoint Measures

    Revicki D, Hays RD, Cella D, Sloan J. Recommended methods for determining responsiveness and minimally important differences for patient-reported outcomes. J Clin Epidemiol. 2008 Feb;61(2):102-9.

    Gwaltney Consulting Confidential

  • Duke Robert J. Margolis, MD Center for Health Policy

    Clinical Outcome Assessments: Establishing and Interpreting

    Meaningful Within-Patient Change 4/4/17

    RTI-HS Exit interview overview slides_Duke-Margolis conference_31MAR2017.pdfClinical Trial Exit Interviews AcknowledgmentsWhat is an Exit Interview? Why Do Exit Interviews? Exit InterviewsSample Interview ConceptsPotential ApplicationsApproaches to Conducting Patient InterviewsIssues to Consider in OperationalizingPotential Methodological Considerations / Limitations Factors Contributing to a More Successful Interview StudyFactors Contributing to a More Successful Interview StudyExit Interview Study ExamplesExample 1: Exit Interviews with COPD and Asthma Patients in Prospective, Real World Clinical StudiesExample 2: Interviews with Patients with Diabetic Gastroparesis Before and After TreatmentExample 3: Exit Interviews with Clinical Trial Participants with Carcinoid Syndrome (CS)Example 3: Exit Interviews with Clinical Trial Participants with Carcinoid Syndrome (CS): Results