Clinical Outcome Assessments: Establishing and ... Outcome Assessments: Establishing and Interpreting Meaningful Within-Patient Change 4/4/17 Exploring the Use of Emerging Methods

Duke Robert J. Margolis, MD Center for Health Policy

Clinical Outcome Assessments: Establishing and Interpreting

Meaningful Within-Patient Change 4/4/17

CLINICAL OUTCOME ASSESSMENTS: INTERPRETING MEANINGFUL CHANGE

Duke Margolis Expert Workshop April 4, 2017

Elektra J. Papadopoulos, MD, MPH Clinical Outcome Assessments Staff

Office of New Drugs

Center for Drug Evaluation and Research

U.S. Food and Drug Administration

www.fda.gov 1

http:www.fda.gov

Role of Patient Perspective

Dr. Janet Woodcock:

"It turns out that what is really bothering the patient and what is really bothering the doctor can be radically different things

Framing of FDA Drug Benefit-Risk Assessment

Decision Factor Evidence and Uncertainties I Conclusions and Reasons & .... -=-- -- Sets the context for the weighing of benefits and risks: l fiITTi n ffili1

________ ..... How serious is this indicated condition} and why?

... -- - -~1i;;lllf;.f;U1-1T:lllli!' How wel l is the patient populat ion}s medical! need be ing met by

current ly ava ilable therapies?

Characterize and assess the evidence of benefit:

Benefit How mean ingful is t he benefit} and for whom? How compelling is the expected benefit in the post-market sett ing?

Characterize and assess the safety concerns:

Risk How serious are t he safety signa lls identified in the submitted data? What potential risks could emerge in the post-market setting?

Assess what risk management (e.g., labeling, REMS) may be Risk Management necessary to address the identified safety concerns

Benefit-Risk Summary and Assessment

3

FD!s Patient-Focused Drug Development

(PFDD) Initiative

Patients are uniquely positioned to inform understanding of the therapeutic context for drug development and evaluation There is a need for more systematic ways of gathering patient

perspective on their condition and treatment options

Patient-Focused Drug Development (PFDD) is part of FDA commitments under PDUFA V* FDA is convening 24 meetings on specific disease areas in FY 2013-

17 Meetings can help advance a systematic approach to gathering

input

*The fifth authorization of the Prescription Drug User Fee Act, enacted in 2012

4 4

PFDD in Chronic Disease

PFDD meetings routinely ask for patients perspectives on what an ideal treatment would look like and what clinical benefit would be the most meaningful to them Concepts such as emotional impact of disease,

ability to perform activities are often cited by patients as important E;g;, in Parkinsons disease patients want to know

functional status over time Depending on the stage of disease even small amounts

of deterioration can make the difference between being able to perform basic activities (e.g., feeding oneself) independently or not

5

PFDD Next Steps

Advance science of

patient input

Engage wider community to discuss methodologically sound approaches that:

Bridge from initial PFDD meetings to more systematic collection of patients input

Generate meaningful input on patients experiences and perspectives to inform drug development and B-R assessment

!re fit for purpose in drug development and regulatory context

Provide guidance

To: patient communities, researchers, and drug developers

On: pragmatic and methodologically sound strategies, pathways, and methods to gather and use patient input

6 6

Interpretation of

Clinically Meaningful

Statistical significance alone is not sufficient

Clinical benefit: a positive clinically meaningful effect of an intervention, i.e., a positive effect on how an individual feels, functions, or survives.

To establish clinical benefit we consider two questions: 1) Does the assessment measure or reflect something of

significance to patients? Relies on patient, caregiver and expert input/engagement

2) Is the magnitude of change at the individual level sufficiently large to affect how patients feel or function in daily life? 7

Triangulation of Evidence

Multiple methods used to select a benchmark for meaningful change

Often result in a range of values for what is a

clinically meaningful benchmark

Triangulation of evidence consists of examining these values to converge on an appropriate value or range of values likely to represent meaningful change in the outcome of interest

8

Guidance for Industry Patient-Reported Outcome Measures: Use in Medical Product Development

to Support Labeling Claims

li.S. Depar1men1 of Healt h and Jiu.ma n Sfr.,.kes Food :md Drug Adrninistm1 lon

Ct'nter for Drug E, :1l uatlon :md Resl".:t rd1 (CDER) Centf.'r for Hiologics fi:, ,a luation and Res('a lTh (CHER) c ... nter for IX',kes and Radiologka l Health (CDRJ [)

December 2009 Clinical/Medical

Good Measurement Principles

http://www.fda.gov/downlo ads/Drugs/GuidanceComplia nceRegulatoryInformation/G uidances/UCM205269.pdf

FDA PRO Guidance defines good measurement principles to consider for well-defined and reliable (21 CFR 314.126) PRO measures

All COAs can benefit from the good measurement principles described within the guidance

But, judgment and flexibility are needed!

www.fda.gov 9

http:www.fda.govhttp://www.fda.gov/downlo

Final PRO Guidance (2009)

Clinically meaningful thresholds may vary by target population: we will evaluate an instruments responder definition in the

context of each specific clinical trial.

Anchor-based methods emphasized: Empiric evidence for any responder definition is derived using

anchor-based methods /explore the associations between the targeted concept of

the PRO instrument and the concept measured by the anchors

Multiple anchors recommended

Distribution-based methods: /should be considered as supportive and are not appropriate

as the sole basis for determining a responder definition 10

Final PRO Guidance (2009) Emphasizes the display of individual

responses to treatment: /it is possible to present the entire distribution

of responses for treatment and control group, avoiding the need to pick a responder criterion. Whether the individual responses are meaningful represents a judgment/

/cumulative distribution displays show a continuous plot of the percent change from baseline on the X-axis and the percent of patients experiencing that change on the Y-axis.

A variety of responder definitions can be identified along the cumulative distribution of response curve. 11

ut

The presentation of all possible response level cut-off points does not eliminate the need to identify the level of change that is clinically important (or at least to state our uncertainty about that level)

12

Cumulative Distribution Function (CDF) (DB4, pooled across treatment arms)

100

-a, 90 .., ta 0:: i.. 80 a, -0 C: 0 C.

70 IA a,

60 0:: -a, 0J) so ta .., C: a,

40 u i.. a, Q.

30 a, -~ ni 20 :s E

10 :s u

0

-6

- Much Bett er (n=298)

- Somewhat Bet ter (n=288)

- Not Changed (n=185)

-5 -4 -3 -2 -1 0 1

Change from Baseline in Nocturia Episodes

Improvement

2

Source: Dr. Jia Guo; Bone, Reproductive and Urologic Drugs 13Advisory Committee 10/19/2016

CDF Plot by Treatment Arms (D84) 100

90 - SER 120 1.5 mcg (n=260) 80

CJJ 70 +" - Placebo (n=260)

n, a: 60 ... CJJ

"'C so C: 0 40 Q. V)

CJJ 30 a: 36%

20

10

0

-6 -4 2

Change from Baseline in Nocturia Episodes

Improvement

Source: Dr. Jia Guo; Bone, Reproductive and Urologic Drugs 14Advisory Committee 10/19/2016

Establishing Meaningful Change:

Examples from FDA Guidance

!lzheimers Disease: Developing Drugs for the Treatment of Early Stage Disease (2013) Co-primary endpoint of cognitive test and a functional or global

assessment The intent of this dual measurement is to ensure the clinical

meaningfulness of a cognitive benefit that may be observed

Irritable Bowel Syndrome: Clinical Evaluation of Drugs for Treatment (2012) Patient global assessments

Example: How would you rate your IS signs or symptoms overall over the past 7 days?

Analgesic Indications: Developing Drugs and Biological Products (2014) Allows the use of a responder analysis (e.g., 30% reduction in pain with

early discontinuation counted as failure) in addition to differences in group means

Encourages use of cumulative distribution functions in the package insert 15

A word about MID and MID

Minimum important difference (MID) was removed from the Final PRO Guidance 2009 Confusion resulted from the term being used

interchangeably to indicate either group-level mean

differences as well as individual level change Use of the term minimal is problematic: While a minimal amount of change may be noticeable, it does not

necessarily imply the change is meaningful to patients

Minimal clinically important difference (MID)- i.e., the smallest difference in score

Beyond Anchor-based and

Distribution-based methods:

Examples of Emerging Methods Bookmarking/Standard Setting Patients and experts are presented with clinical vignettes of a disease in

order to reach a consensus on thresholds for severity levels Designed for measures that have been calibrated using an IRT model

Scale judgment Panels of judges evaluate pairs of completed tests to determine whether

the amount of change specified by the responses before and after treatment is meaningful

Exit interviews Interviews of patients who recently completed a clinical trial an be used to collect qualitative and quantitative data about patients

experience of disease or treatment burden and changes during the course of the clinical trial

Others 17

Todays Goals

Advance the discussion on methods to identify meaningful within-patient change in COAs by discussing key issues and major challenges, including: What are the advantages and disadvantages of each of the

methods?

How might threshold determinations differ across the four types of COAs?

What are special considerations for establishing meaningful change in small and heterogeneous study populations?

How and when could these methods be most feasibly used in drug development?

18

U.S. FOOD & DRUG

ADMINISTRATION

Exploring the Use of Emerging Methods to Derive and Interpret Meaningful Within-Patient Change Using Idio

Scale-Judgment (Bookmarking/Standard-Setting)

Karon F. Cook

Department of Medical Social Sciences, Feinberg School of Medicine, Northwestern University,

Chicago, IL

April 4, 2017

Washington, DC

\

Background H

ow

mu

ch w

e kn

ow

Building State of the Art Measures

How to Interpret Scores on State of the Art Measures

Meaningful Change

Methods for Defining Meaningful Score

Differences

Statistical

Global Ratings of Change

External Anchors

often

always

sometimes

How often are you too tired to

never

x

Pro

babili

ty o

f specifie

d r

esponse

0.0

1.0

x x

Fatigue

socialize with family?

11 I I I I I I I I

I I I I I I I I I

Never Rarely Sometimes

Never Rarely Sometimes Always

40 50 60 30 70

Item Response Samples T=40 never was too tired to do household chores.

never needed to sleep during the day.

rarely had trouble finishing things because

she was too tired.

rarely was so tired that she needed to rest

during the day.

rarely felt that she had no energy.

FATIGUE

30 40 50 60 70

00000000 00000 0 0 0 0

I I I I I I I 11 I I I I I I 11 I I I I I I I I I I I I I I I 11 I I I I I I 1 11 I I I I I I I I I I I I I I I ii I I I I I 1 11 I I I I I

Item Response Samples

sometimes was too tired to eat.

often had trouble finishing things because she was too tired.

often was too tired to do her household chores.

often needed to sleep during the day.

T=62

always frustrated by being too tired to do the things she wanted to do.

30 40 50 60 70

National Multiple Sclerosis Society

Grant #H00145 Deborah Miller, PI

10

36

Online panel of 500 participants with Multiple Sclerosis

Responded to NeuroQoL Fatigue Short Form

Developed 18, 5-item sample response sets, 2 pts apart.

Ms. Butler

e.g. Branched into 7 fatigue levels 48-51

Presented with 7 response samples

70

MY FATIGUE

OTHER PERSON'S FATIGUE

fCREEn /HOT/

In PART B, you will

Look at the fatigue reports of 7 people who have MS

Compare each persons fatigue to your own fatigue. For example, your fatigue might be greater.

Or, you might decide your fatigue is the SAME or LESS than that other persons.

Woul~ Maff er

/CREEn /HOT/

If you decide your fatigue is DIFFERENT from the other persons, you will then

Consider what it would be like to have this persons fatigue, and

Decide if the difference would matter to you in your daily life.

13

/HOT/

Depending on your own fatigue, you may decide that none, some, or all of these people have more, less, or

the same amount fatigue.

There are no right answers just your own thoughtful judgments.

[T Score = 58]

This is what Ms. Anderson said about her fatigue over the last 7 days. She reported that she:

sometimes felt weak all over.

often had to limit social activity because she was tired.

sometimes had trouble starting things because she was too tired.

often was too tired to take a short walk.


Compared to Ms. !ndersons , has YOUR FATIGUE been:

Greater than Ms. !ndersons

The same as Ms. !ndersons

Less than Ms. !ndersons

You said YOUR FATIGUE over the past week was Greater

If your fatigue IMPROVED to Ms. Andersons level, would it make a difference in your daily life?

It wouldnt really make a difference in my daily life.

It would make a difference in my daily life (things I

do day-to-day would be easier).

Less

G

reat

er

MY FATIGUE

MS. ANDERSON'S FATIGUE

This is what Ms. Anderson said about his fatigue over the last 7 days. She reported that she:


often or always had to limit social activity because she was tired.




If your fatigue WORSENED to Ms. !ndersons level, You said YOUR FATIGUE over the past week was would it make a difference in your daily life? LESS than MS. !NDERSONS FATIGUE. It wouldnt really make a difference in my daily

life.

It would make a difference in my daily life (many

Less

G

reat

er

MY FATIGUE

MS. !NDERSONS FATIGUE

of the things I do day-to-day would be harder).

This is what Ms. Anderson said about his fatigue over the last 7 days. She reported that she:


often or always had to limit social activity because she was tired.




1111 I 1111 I 1111 I 1111I111111111 I 1111 I 1111 I 1111I111111111I111111111 I 1 ~ t ~ t t ~ 1' L_JI IL_JI II IL_JI I

5 2 5 4 5 6 5 8 6 0 6 2 6 4

Butler Richardson Woods Anderson Foster Allen Harris

T = 58.8 Score Group 56-60

7 points >

5 points >

3 points >

1 pt

5 points <

3 points <

1 pt*

*

*

*

ANALYSIS

Results

Analyses to Estimate Thresholds for

Interpreting Change

Calculate minimum distance endorsed by respondent as meaningful improvement/decrement

Identify thresholds that would capture different percentages of respondents minimums.

.0

14.0

13.0

12.0

11 .0

"ti 10.0

0 ~ Ill 9 .0 QI ... ~ 8.0 .... jij

7.0 ::::, "ti

> 6.0 :s C 5.0

4.0

3.0

2.0

1 .0

.0

Thresholds for Worsening Mean of Individual Thresholds @

0 0

0

90th Percentile 0 & 'C)" o ................. . 75 th Percentile O O o ............................. Q .Q .. ~ 'O" G . ............... .

0 0 0 0 O O 0

-O E> 0 0 0 0

0 0

501h Percentile

0 0 0 0 0 0

0 0 0 0

0 0 0

20 30 40 50 60 70 80

Neuro-Qol Fatigue T-Score

Threshold locations for capturing 50, 75, and

95% of distances endorsed as important

worsening

0

-1 .0

-2.0

-3.0

-4.0

-5.0 -g

0 -6.0 .l: VI -7.0 QI ... .l: -8.0 .... ;; -9.0 ::::, -g

-10.0 > -g

-11 .0 C

-12.0

-13.0

-1 4.0

-15.0

-16.0

-17.0

0 0 0 0 0 0

0 0 O O 0 O O 0

0

0 0

0

50th Percentile= -3.3 0 o O O O 0 - - - - - - . - - - - - - - - - - - - - - o- o-o 0 f)_ -cs - - - - -

0 0 O O 0 75th Percentile= -5.1 o O 0 - - - - - - - - - - - - - - o- c- - - o- - - - - - - - -

0 0 O 0 0 0 0 0 0

90th Percentile= -7.3 o .. - - - ..... - - - .... - - - ... -~ - - ... 0- . - o- .. ~- - -e5 ..... - - - .... - - - ..

20

Thresholds for Improvement Mean of Individual Thresholds ~

30 40 50 60

Neuro-Qol Fatigue T-Score

0

0

0

70 80

Threshold locations for capturing 50, 75, and

95% of distances endorsed as important

improvement

11111111111

1111 111111111 111111111 111il1111 111~11il111il1111l111il111il1

5 2 5 4 5 6 5 8 6 0 6 2 6 4

7 points >

5 points >

3 points >

1 pt

Richardson Woods Anderson Butler

*

*

*

*

8% Reversals92% Current Judgments Consistent with Prior Judgment

1111 I 1111 I 1111 I 1111I111111111 I 1111 1111 I 1111I111111111I111111111 I 1

ALe

ss

G

reat

er

MY FATIGUE

MS. !NDERSONS F TIGUE

5 2 5 4 5 6 5 8 6 0 6 2 6 4

Anderson

Inconsistent Judgment

7.6% of all judgments were inconsistent with the one

prior (of 3000 opportunities)

344

94

51

11

0

50

100

150

200

250

300

350

400

0 1 2 3

FREQUENCY OF INCONSITENT JUDGMENTS

*

Qual Life Re (2017) 26:847 57 DOI 10. 1007/sl 11 6-016-1-114-

When global rating of change contradicts observed change: Examining appraisal processes underlying paradoxical responses over time

Carolyn E. Schwartz 1.2 Victoria E. Powell 1 Bruce D. Rapkin3

Accepted: 1-1 September 2016/PublL,hcd onliae: October 20 16 lfl prin e.- International Publlmin,, c, .. ' ~-' - ""'

CrossMarlc

Scores of N=525

Declined

Mental health component scores over time in MS

Unchanged

On GROC, 48.6% made a paradoxical judgment

reporting worse status when observed score was unchanged or

endorsing the same status when observed scores had declined.

'

:

We were able to estimate plausible responder thresholds for consequential change.

Participants evaluated range of IRT-Vs close to their own fatigue levels.

Participants reported high confidence that their judgments (3.3 between moderately and highly confident).

'

:

Judgments contextualized in a patient-relevant contextmake a difference in daily life.

Design allowed large samples.

Judgment errors existed, but were within range of other methods.

Judgement errors not strongly associated with demographics (e.g., education)

How can we do better

Qualitative research to understand what is important to people in assessing change.

Cognitive debriefing to understand what people are attending to. Are they attending to different things. Could we selection of concepts of the vignettes, should they be standardized.

Frame vignettes by what is important to people.

Try to understand the variation in levels of change that people believe is important.

Can set a threshold that is most representative, but it will not catch everyone.

Change study design so that everyone is getting same distances. Branch on every score.

How can we do better?

I ? I

I ~ I

I.I ,

l.

An Alternative Model:

Some Remarks about

(Educational) Standard Setting, Characterizing Meaningful Change, and the Scale-Judgment Method

David Thissen L.L.Thurstone Psychometric Laboratory

The University of North Carolina at Chapel Hill

Minimally Important Difference (MID) estimation is not like answering the question

What is the ratio of the circumference of a circle

to its diameter?

Minimally Important Difference (MID) estimation is like

At least for regulatory purposes, MID is like a speed limit: A policy decision informed by data

(And likely between 0.2 and 0.5 standard units)

Most authorities on standard setting (e.g., Green,Trimble, and Lewis, 2003; Hambleton, 1980; Jaeger, 1989; Shepard, 1980; Zieky, 2001) suggest that, when setting cut scores, it is prudent to use and compare results from different standard setting methods. (p. 155)

Green, D.R.,Trimble, C.S., and Lewis, D.M. (2003). Interpreting the results of three different standard-setting procedures. Educational Measurement: Issues and Practice, 22, 22-32. Hambleton, R.K. (1980).Test score validity and standard setting methods. In D.C. Berliner, (Ed.), Criterion-referenced measurement:The state of the art (pp. 80-123). Baltimore, MD: Johns Hopkins University Press. Jaeger, R.M. (1989). Certification of student competence. In R.L. Linn (Ed.), Educational measurement (3rd ed., pp. 485-514).Washington DC:American Council on Education. Shepard, L.A. (1980). Standard setting issues and methods. Applied Psychological Measurement, 4, 447-467. Zieky, M.J. (2001). So much has changed: How the setting of cutscores has evolved since the 1980s. In G.J. Cizek (Ed.), Setting performance standards: Concepts, methods, and perspectives. Mahwah, NJ: Lawrence Erlbaum.

Section II: Standard Setting Methods Chapter 4.The Nedelsky Method Chapter 5.The Ebel Method Chapter 6.The Angoff Method

and Angoff Variations Chapter 7.The Direct Consensus Method Chapter 8.The Contrasting Groups

and Borderline Group Methods Chapter 9.The Body of Work

and Other Holistic Methods The Body of Work Method The Judgmental Policy Capturing Method The Dominant Profile Method The Analytic Judgment Method

Chapter 10.The Bookmark Method Chapter 11.The Item-Descriptor

Matching Method Chapter 12.The Hofstee and Beuk Methods

National Assessment of Educational Progress (NAEP) Achievement Levels

Grade 4 Mathematics, 1996

Below Basic Basic Proficient Advanced

100 150 200 250 300 350 NAEP Scale Score

A Ranking Procedure to Find Score Ranges Associated with

Mild,Moderate, and Severe Conditions

(Cella et al., 2008, 2014)

Cella, D., Choi, S., Rosenbloom, S., Surges,Tatum, D., Garcia, S., Lai, J.-S., George, J., & Gershon, R. (2008). A novel IRT-Based

case-ranking approach to derive expert standards for symptom severity (paper presentation), International Society for Quality

of Life Research Annual Scientific Meeting. Montevideo, Uruguay. Cella, D., Choi, S., Garcia, S., Cook, K. F., Rosenbloom, S., Lai, J. S.,Tatum, D. S., & Gershon, R. (2014). Setting

standards for severity of common symptoms in oncology using the PROMIS item banks and expert judgment. Quality of Life Research, 23, 2651-2661.

- -r ~

~ ~

- ---r "'1111 ~

~

- ---r "'1111 ~

~ -- ---,.. " ~ t

'-- ~ ~ ~ ~ ,~

Sample Fatigue VignetteT-score 40

Rating ______

FATIGUE PINK 1 How often did your fatigue make you

feel less alert? Never Rarely Some-

times Often Always

2 How often did you have trouble Never Rarely Some- Often Always starting things because of your times fatigue?

3 How often did you feel run-down? Never Rarely Some-times

Often Always

4 How often were you energetic? Never Rarely Some-times

Often Always

5 How easily did you find yourself getting tired on average?

Not at all A little bit Some-what Quite a bit

Very much

Graphic from: Cella, D., Choi, S., Rosenbloom, S., Surges,Tatum, D., Garcia, S., Lai, J.-S., George, J., & Gershon, R. (2008). A novel IRT-Based case-ranking approach to derive expert standards for symptom severity (paper presentation), International Society for Quality of Life Research Annual Scientific Meeting. Montevideo, Uruguay.

Sample Depression VignetteT-score 60

Rating ______

DEPRESSION - MINT 1 I felt that I had nothing to look

forward to Never Rarely Some-

times Often Always

2 I felt that I wanted to give up on everything

Never Rarely Some-times

Often Always

3 I felt disappointed in myself Never Rarely Some-times

Often Always

4 I felt lonely Never Rarely Some-times

Often Always

5 I felt I had no reason for living Never Rarely Some-times

Often Always


I

I

ANXIETY - Case Examples Exercise

Step 1: Please rev ievv the ten different cards in the "Anxiety" envelope. Each ca rd represents a patient \1vho fa I ls along a different place on the anxiety continuum. Sort the cards in order from least severe to most severe, giving each color a ranking ("1" being least severe).

Please enter the card color (e.g. "Pink", "Blue", etc.) below the number ranking that you have assigned it. You are encouraged to give each card a unique ranking, but this is not required. If you believe tvvo patients are tied, for example, at rank "6", then vvrite both color names under the number "6. "

Least severe Most severe - -

1 2 3 4 5 6 7 8 9 10 -

COLOR:

COLOR: (if applicable)

COLOR: I (if applicable)

Step 2: Novv please d ra\1V three ve rtica I Ii nes bet\,veen ran ks (e.g. betvveen \\3" and \\4"); one delineating each of the follovving:

1. A separation between those cards (i.e. patients) that you believe represent a normal level of anxiety and a mild level of anxiety

2. A separation between those cards representing a n1ild level of anxiety and a moderate level of anxiety 3. A separation between those cards representing a moderate level of anxiety and a severe level of anxiety

D

I

Expert Ranking Sheet:Anxiety, Step 1


I I

I

ANXIETY - Case Examples Exercise

Step 1: Please review the ten different cards in the "Anxiety" envelope. Each ca rd represents a patient who falls along a different place on the anxiety continuum. Sort the cards in order from least severe to most severe, giving each color a ranking ("1" being least severe).

Please enter the card color (e.g. 'Pink", "Blue", etc.) below the number ranking that you have assigned it. You are encouraged to give each ca rd a unique ranking, but th is is not required. If you believe tvvo patients are tied, for exam pie, at rank ' 6", th en vvrite both color names under the number "6. "

Least severe ; Most severe - -

1

COLOR:

COLOR: (if applicable)

COLOR: I (if applicable)

2 - -

-..

.

-

t

3 4 - - - , - - 5 6 7 I 8 9 10 I - - _,, - - I I I I I I - I I " , I

! I I I I I I

, I I I I

I I I I I I I I I -

Step 2: No\1V please d ra vv three ve rtica I Ii nes bet\,veen ran ks (e.g. bet\,veen "3" and "4"); one delineating each of the follo\1Ving:

1. A separation betvveen those cards (i.e. patients) that you believe represent a normal level of anxiety and a mild level of anxiety

2. A separation between those cards representing a mild level of anxiety and a moderate level of anxiety 3. A separation between those cards representing a moderate level of anxiety and a severe level of anxiety

D

Expert Ranking Sheet:Anxiety, Step 2

Normal Mild Moderate Severe


"U a "'C 0 ~ 0 ::::,

0 . 0 01

0 . 0 ~

0 . 0 w

0 . 0 I\.)

0 . 0 ....Ji.

52.5

Anxiety

62.5 0 .___----.-----------.----------------.......-------------____, . 0

30 40 50 60

T-Score

70 80

After a consensus-

building process

Normal Mild Moderate Severe


A Bookmarked-Vignettes Procedure to Find Score Ranges Associated with

Mild,Moderate, and Severe Conditions

(Cook et al., 2014; Morgan et al., 2017)

Cook, K. F.,Victorson, D. E., Cella, D., Schalet, B. D., & Miller, C. (2014). Creating meaningful cut-scores for Neuro-QOL measures of fatigue, physical functioning and sleep disturbance using standard setting with patients and providers. Quality of Life Research, 24, 575-589.

Judges do not know this vignette is for a T-score of 47.5

Ms. Millers Fatigue

In the last 7 days, Ms. Miller rarely felt weak all over and rarely was so tired she couldnt take a short walk. However, she sometimes felt tired, which got in the way of her doing her household chores. Feeling too tired to do the things she wanted to do was sometimes frustrating for her.

In summary, Ms. Miller reports being:

Rarely weak all over. Rarely too tired to take a short walk. Sometimes tired. Sometimes too tired to do household chores. Sometimes frustrated by being too tired to do the things she

wanted to do.

Cook, K. F.,Victorson, D. E., Cella, D., Schalet, B. D., & Miller, C. (2014). Creating meaningful cut-scores for Neuro-QOL measures of fatigue, physical functioning and sleep disturbance using standard setting with patients and providers. Quality of Life Research, 24, 575-589.

alertwhenhewokeupandreadytostarttheday.Inthelast7days, hasneverhadtroublesleepingbecauseofbaddreamsandneverhada hard9mecontrollinghisemo9onsbecauseofpoorsleep.

Insummary,Mr.Turner

Neverhavingtroublesleepingbecauseofbaddreams. Neverhavingahard9mecontrollinghisemo9onsbecauseof

Rarelyfeeling9red. O,enfeelingalertwhenhewokeup. O,enwakingupandfeelingreadytostarttheday.

Mr.Turnerhasrarelyfelt9red.O,en, hehasfelt alertwhenhewokeupandreadytostarttheday.Inthelast7days, hasneverhadtroublesleepingbecauseofbaddreamsandneverhada hard9mecontrollinghisemo9onsbecauseofpoorsleep.

Insummary,Mr.Turner




Insummary,Mr.Turner




Insummary,Mr.Turner




Insummary,Mr.Turner




Insummary,Mr.Turner




Insummary,Mr.Turner




Insummary,Mr.Turner



t

I t

I

t

he

p.

he

p.

he

p.

he

p.

he

p.

he

p.

he

p.

he

p.

NoProblems MildProblems

AnnasPain

Inthelast7days, Mr.Turnerhasrarelyfelt9red.O,en, hehasfelt

JuliasPain

Inthelast7days, reported: AndreasPain

poor slee Inthelast7days,

MildProblems ModerateProblems

reported: JacobsPain

Inthelast7days,poor slee

reported: ChloesPain

Inthelast7days,poor slee reported: KristensPain

Inthelast7days,poor slee reported: MayasPain

Severe Problems ModerateProblems

Inthelast7days,poor slee reported: ClairesPain

Inthelast7days,poor slee reported: AddisonsPain

Inthelast7days, Addisonhasrarelyfelt9red.O,en, hehasfeltalert poor slee whenhewokeupandreadytostarttheday.Inthelast7days, hehas reported: neverhadtroublesleepingbecauseofbaddreamsandneverhada hard9mecontrollinghisemo9onsbecauseofpoorsleep.

poor slee Insummary,Mr.Turner reported: Neverhavingtroublesleepingbecauseofbaddreams. Neverhavingahard9mecontrollinghisemo9onsbecauseof

poor sleep. Rarelyfeeling9red. O,enfeelingalertwhenhewokeup. O,enwakingupandfeelingreadytostarttheday.

Graphic from: DeWitt, E.M. (2015, February 6). Establishing clinical meaning and defining important differences for PROMIS measures in Juvenile Idiopathic Arthritis. Presentation at UNC PROMIS Pediatric Investigators Meeting, Chapel Hill, NC.

..D 0

5 7 8

Graphic from: Morgan, E.M., Mara, C.A., Huang, B., Barnett, K., Carle,A.C., Farrell, J.E., & Cook, K.F. (2017). Establishing Clinical Meaning and Defining Important Differences for Patient-Reported Outcomes Measurement Information System (PROMIS) Measures in Juvenile Idiopathic Arthritis Using Standard Setting with Patients, Parents, and Providers. Quality of Life Research, 26, 565-586.

The Scale-Judgment Method to Estimate the Minimally Important Difference (MID)

between Scores

(Thissen et al., 2016)

Thissen, D., Liu,Y., Magnus, B., Quinn, H., Gipson D.S., Dampier, C., Huang, I-C., Hinds, P.S., Reeve, B.B., Gross, H.E., & DeWalt, D.A. (2016). Estimating minimally important difference (MID) in PROMIS pediatric measures using the scale-judgment method. Quality of Life Research, 25, 13-23.

A minimally important difference (MID) has been defined as the smallest difference in score that patients perceive as important, and which would lead the clinician to consider a change in the patients management

Guyatt et al. (2002)

Existing methods:

Distribution-based indices (not an empirical method; merely expresses change in standard units)

Anchor-based methods (contrasting groups in educational standard setting)

Guyatt, G. H., Osoba, D.,Wu,A.W.,Wyrwich, K.W., & Norman, G. R. (2002). Methods to explain the clinical significance of

health status measures. Mayo Clinic Proceedings, 77, 371-383. Revicki, D., Hays, R. D., Cella, D., & Sloan, J. (2008). Recommended methods for determining responsiveness and minimally

important differences for patient-reported outcomes. Journal of Clinical Epidemiology, 6, 102-109.

Earlier judgment-based methods Delphi Method, Delphi plus anchor, physician survey, expert panels using visual analog scales or changes to item responses Bellamy, N.,Anastassiades,T. P., Buchanan,W.W., Davis, P., Lee, P., McCain, G.A.,Wells, G.A., & Campbell, J. (1991). Rheumatoid arthritis antirheumatic drug trials. III. Setting the delta for clinical trials of antirheumatic drugs--results of a

consensus development (Delphi) exercise. Journal of Rheumatology, 18, 1908-1915. Bellamy, N., Buchanan,W.W., Esdaile, J. M., Fam,A. G., Kean,W. F.,Thompson, J. M.,Wells, G.A., & Campbell, J. (1991). Ankylosing spondylitis antirheumatic drug trials. III. Setting the delta for clinical trials of antirheumatic drugs--results of a

consensus development (Delphi) exercise. Journal of Rheumatology, 18, 1716-1722. Bellamy, N., Carette, S., Ford, P. M., Kean,W. F., le Riche, N. G., Lussier,A.,Wells, G.A., & Campbell, J. (1992). Osteoarthritis

antirheumatic drug trials. III. Setting the delta for clinical trials--results of a consensus development (Delphi) exercise. Journal of Rheumatology, 19, 451-457. Spiegel, B. M.,Younossi, Z. M., Hays, R. D., Revicki, D., Robbins, S., & Kanwal, F. (2005). Impact of hepatitis C on health

related quality of life: a systematic review and quantitative assessment. Hepatology, 41, 790-800. Wyrwich, K.W., Metz, S. M., Kroenke, K.,Tierney,W. M., Babu,A. N., & Wolinsky, F. D. (2007).Triangulating patient and

clinician perspectives on clinically important differences in health-related quality of life among patients with heart disease. Health Services Research, 42(6 Pt 1), 2257-2274; discussion 2294-2323. Wells, G., Li,T., Maxwell, L., MacLean, R., & Tugwell, P. (2007). Determining the minimal clinically important differences in

activity, fatigue, and sleep quality in patients with rheumatoid arthritis. Journal of Rheumatology, 34, 280-289. Rai, S. K.,Yazdany, J., Fortin, P. R., & Avina-Zubieta, J.A. (2015).Approaches for estimating minimal clinically important

differences in systemic lupus erythematosus. Arthritis Research and Therapy, 17, 143. van Walraven, C., Mahon, J. L., Moher, D., Bohm, C., & Laupacis,A. (1999). Surveying physicians to determine the minimal

important difference: implications for sample-size calculation. Journal of Clinical Epidemiology, 52, 717-723. Todd, K. H., & Funk, J. P. (1996).The minimum clinically important difference in physician-assigned visual analog pain scores. Academic Emergency Medicine, 3, 142-146. Dempster, H., Porepa, M.,Young, N., & Feldman, B. M. (2001).The clinical meaning of functional outcome scores in children

with juvenile arthritis. Arthritis and Rheumatology, 44, 1768-1774. Gong, G.W.,Young, N. L., Dempster, H., Porepa, M., & Feldman, B. M. (2007).The Quality of My Life questionnaire: the

minimal clinically important difference for pediatric rheumatology patients. Journal of Rheumatology, 34, 581-587.

One month ago Today

I felt alone.

never almost never sometimes

I felt like I couldnt do anything right.


I felt everything in my life went wrong.

never almost never

I felt sad.

never almost never

I thought that my life was bad.

never almost never

I could not stop feeling sad.

never almost never

I felt lonely.

never almost never

I felt unhappy.

never almost never

sometimes

sometimes

sometimes

sometimes

sometimes

sometimes

often almost always

often almost always

often almost always

often almost always

often almost always

often almost always

often almost always

often almost always

T-score 62.1

I felt alone.






I felt sad.

never almost never


never almost never

sometimes

sometimes



I felt lonely.

never almost never

I felt unhappy.

never almost never

sometimes

sometimes

often almost always

often almost always

often almost always

often almost always

often almost always

often almost always

often almost always

often almost always

T-score 58.9

The scale-judgment method presents judges with pairs of questionnaires, artificially completed using IRT, with scores known to the experimenter but not the judges Thissen, D., Liu,Y., Magnus, B., Quinn, H., Gipson D.S., Dampier, C., Huang, I-C., Hinds, P.S., Reeve, B.B., Gross, H.E., & DeWalt, D.A. (2016). Estimating minimally important difference (MID) in PROMIS pediatric measures using the scale-judgment method. Quality of Life Research, 25, 13-23.

One month ago Today

I felt alone.





never almost never

I felt sad.

never almost never


never almost never


never almost never

I felt lonely.

never almost never

I felt unhappy.

never almost never

sometimes

sometimes

sometimes

sometimes

sometimes

sometimes

often almost always

often almost always

often almost always

often almost always

often almost always

often almost always

often almost always

often almost always

T-score 62.1

I felt alone.






I felt sad.

never almost never


never almost never

sometimes

sometimes



I felt lonely.

never almost never

I felt unhappy.

never almost never

sometimes

sometimes

often almost always

often almost always

often almost always

often almost always

often almost always

often almost always

often almost always

often almost always

T-score 58.9

The judges (clinicians, adolescents, parents) judge for each pair whether the (imaginary) respondent is doing or feeling better, worse, or about the same. Thissen, D., Liu,Y., Magnus, B., Quinn, H., Gipson D.S., Dampier, C., Huang, I-C., Hinds, P.S., Reeve, B.B., Gross, H.E., & DeWalt, D.A. (2016). Estimating minimally important difference (MID) in PROMIS pediatric measures using the scale-judgment method. Quality of Life Research, 25, 13-23.

For the Depressive Symptoms example, this process yields data with summary statistics like these

Scale Score Frequency Proportion 1 month No Wrong

Pair ago Today Difference Better difference Worse Direction

2 49.5 57.9 8.4 23 19 185 0.10

3 56.7 62.1 5.4 32 18 176 0.14

1 43.5 45.9 2.4 15 151 61 0.07

5 64.3 62.1 -2.2 133 66 27 0.12

4 62.1 58.9 -3.2 179 33 15 0.07

6 73.4 66.0 -7.4 189 21 17 0.07

There were more data for Fatigue, Mobility, and Pain. Thissen, D., Liu,Y., Magnus, B., Quinn, H., Gipson D.S., Dampier, C., Huang, I-C., Hinds, P.S., Reeve, B.B., Gross, H.E., & DeWalt, D.A. (2016). Estimating minimally important difference (MID) in PROMIS pediatric measures using the scale-judgment method. Quality of Life Research, 25, 13-23.

If the judges were homogeneous, data analysis could be logistic regression of the probability different on the scale score difference, with the 50-50 point the MID:

0.0

0.5

1.0

P("different")

0 1 2 3 4 5 6 7 8 Scale Score Difference

But the judges were not homogeneous.


So we treated the pairs of questionnaires as items, the same-different judgments as item responses, and fitted the data with the 1PL IRT model:

0.0

0.5

1.0

P("different")

5.73.6 2.4 1.15.6 2.3

-3 -2 -1 0 1 2 3 (Propensity to respond "different")


Then we interpolated the scale-score difference for a hypothetical item that would be judged different 50% of the time by an average respondent:

0.0

0.5

1.0

P("different")

5.73.6 2.4 1.15.6 2.3

-3 -2 -1 0 1 2 3 (Propensity to respond "different")


XO XO xt:::,. X I:::,. 0 0 0 0

D,. D,.

+ + X X

x+ + +

+ X 0 I:::,.

I:::,. _____________ _Q_ ___ _ +o

X X

We used quadratic regression to interpolate the scale-score difference associated with a pair of questionnaires that would have a 1PL b of zero:

Wrong Direction Omitted Wrong Direction Reversed

2 4

6 8

Sca

le S

core

Diff

eren

ce

Dep. Symp. Fatigue Mobility Pain

-3 -2 -1 0 1

2 4

6 8

Sca

le S

core

Diff

eren

ce

Dep. Symp. Fatigue Mobility Pain

-3 -2 -1 0 1

b b


Wrong Direction Wrong Direction Omitted Omitted

MID s.e. MID s.e.

Clinicians 2.1 0.6 1.9 0.6

Adolescents 2.2 0.6 2.1 0.6

Parents 2.4 0.7 2.2 0.7

MID is about two points on the T-score scale for these health outcomes measures, with no clear difference among the domains.


A Free-Response Method to Estimate the Minimally Important Difference (MID)

between Scores

(Morgan et al., 2017)

Morgan, E.M., Mara, C.A., Huang, B., Barnett, K., Carle,A.C., Farrell, J.E., & Cook, K.F. (2017). Establishing Clinical Meaning and Defining Important Differences for Patient-Reported Outcomes Measurement Information System (PROMIS) Measures in Juvenile Idiopathic Arthritis Using Standard Setting with Patients, Parents, and Providers. Quality of Life Research, 26, 565-586.

DeWitt, Cook, and their colleagues also used something like the scaled-judgment method, but with the judges filling out the responses to the after protocol to make it minimally different from the (given) pre protocol.

This can be conceptualized as a free response variant of the scaled-judgment method.


Future Research

Do these different methods of data collection yield consistent results? Or are there predictable differences?

How do results from these methods compare to results obtained with anchor-based methods, when anchors are available?

Everyone finds differences between groups of judges... adolescents, parents, clinicians; what is to be made of that?

THE UNIVERSITY

of NORTH CAROLINA at CHAPEL HILL

Acknowledgments

This work was funded by the National Institutes of Health through the NIH Roadmap for Medical Research, Grant 1U01AR052181-01.

Thanks to Dave Cella, Karon Cook, and Esi Morgan for their graphics used in this presentation, and my collaborators Yang Liu, Brooke Magnus, Hally Quinn, Debbie S. Gipson, Carlton Dampier, I-Chan Huang, Pamela S. Hinds, Bryce B. Reeve, Heather E. Gross, and Darren A. DeWalt at UNC and across the rest of the PROMIS pediatric multi-site project.

Clinical Trial Exit Interviews

Presented at the Clinical Outcome Assessments: Establishing and Interpreting Meaningful Within-Patient Change Meeting

The Duke-Margolis Center for Health Policy, Washington, DC, April 4, 2017

Dana DiBenedetti, PhD Executive Director, Patient-Centered Outcomes Assessment

RTI (h)(s J Health Solutions

Ti1e power of Th e va lu e o f

knowledge. understanding.

Acknowledgments

T. Michelle Brown Carla (DeMuro) Romano Lynda Doward Claire Ervin Sheri Fehnel Sandy Lewis Diane Whalley

2




What is an Exit Interview?

The collection of (mostly) qualitative data from clinical trial participants Most commonly, interviews are conducted soon after participants complete

the treatment period However, patients (and/or caregivers) experiences and perspectives

regarding treatment benefit may not be fully captured with traditional COAs.

Interviews with clinical trial participants provide the opportunity to more fully explore the impacts of investigational products Describe the meaningfulness of treatment-related changes (positive and

negative) Identify unanticipated treatment benefits

Information regarding pre-study experiences, as well as treatment-related expectations and unmet needs can also be collected.

3




Why Do Exit Interviews?

To identify Characteristics of (sometimes new or rare) patient populations What symptoms/impacts are most important to patients

Allows participants to articulate concepts that may be important to them but that are not obtained (or fully obtained) in the trial, thus

Enriching researchers and sponsors understanding of the patient experience Aiding in interpretation of other clinical data

Full impact of treatment (meaningful changes) Unmet needs of treatment Expectations for and experiences with disease and of treatment Thematic information used to inform future COA strategies and

clinical trial designs Potential treatment differentiators

4




Exit Interviews

Supplement, support, and facilitate the interpretation of data from traditional PRO, PerfO, ObsRO and/or clinical measures Provide greater depth and rationale for data from traditional measures Describe treatment effects Explore the relevance and clinical meaningfulness of specific treatment

changes beyond clinical indices and side effects Explain anomalous results

5

RTI (h)(s} Health Solutions


The power of The value of

Sample Interview Concepts

Patients (and Caregivers) Experiences With and Attitudes About Treatment Symptoms/impact prior to study start Expectations of changes/outcomes

Can compare pre-study expectations with clinical outcomes Anticipated or unanticipated benefits, impact of those benefits

Impact of treatment on daily life/functioning Impact of treatment on most important/bothersome symptoms Onset of benefits/changes

Treatment experiences Convenience of visits, monitoring Managing treatment schedule (e.g., regimen schedule, infusions, monitoring) Most challenging aspect of study treatment Managing adverse events

How well treatment addresses most important/bothersome symptoms Impact of treatment on daily life/functioning, quality of life Satisfaction levels with treatment

Reasons for satisfaction

6




Potential Applications

When to conduct interviews

Both within and outside the context of a clinical trial Implementing as part of a clinical trial is generally more efficient and

maximizes participation as compared with a separate or subsequent study

At various time points (not just at the end of a study) Baseline, at key time point(s) during the study, at the end of a randomized

treatment phase, at the end of open-label extension, etc.

With all participants or select samples of study participants Participants can be selected by site, country, experience of a particular side

effect, patient-reported data

7




Approaches to Conducting Patient Interviews

Approach 1: Experienced, trained qualitative researchers conduct interviews Interviews conducted via telephone or

in-person at designated time(s) Can be prospectively planned into the

CT protocol or done as a substudy Interviews follow a semi-structured guide Values of this approach

Richest source of data, robust methodologically

Level of granularity from experienced interviewers

Limits the variability in data quality (vs large number of individuals with varying degrees of qualitative experience)

Qualitative analysis usually done byinterviewers themselves

Approach 2: Study coordinators (SCs) conduct interviews Qualitative interviewers would develop

interview guide/related materials, and provide training to SCs

Certify, demonstrate proficiency Use a more standardized and heavily

scripted interview guide SCs provide field notes, audio recording

etc. to qualitative researchers who analyze qualitative results

Values of this approach Although data may be less in-depth

than Approach 1 Particularly effective in global trials in

which interview process needs to be scaled to allow for maximal participation

Allows for interview to be conducted by a someone familiar to patient

8




Issues to Consider in Operationalizing

What questions are you trying to answer with the interviews? Exploratory, looking for a signal vs providing data/support for

primary endpoint? Do you need patients from all countries to answer your questions

or sample of participants?

Population Sample size Who is going to conduct interviews? Method Timelines Budget Senior-management buy in

9




Potential Methodological Considerations / Limitations How, if at all, exit interview activities influence CT data Self-selection bias of exit interview volunteers (site and patient

level) Sample

All patients, subsample(s), size

How data will be analyzed How interview data relate to CT data

Potential for additional adverse event reporting

10




Factors Contributing to a More Successful Interview Study General rule of thumb: the more sites and patients, the easier and

less expensive it is to recruit Include prospectively in clinical trial (vs. relying on sites and

patients to volunteer their participation) Increases site and patient willingness and compliance Increases patient sample size Interview substudy can be included as a component of a clinical trial for select

countries (does not have to be for the entire study) Additional protocol amendments and IRB reviews would not be needed Does not significantly add to site burden Training for interview substudy adds ~ 30 minutes to site initiation visits

11




Factors Contributing to a More Successful Interview Study Adequate time to design interview substudy and materials Target an adequate sample size (e.g., 30-50 interviewed patients)

More likely to identify themes/signals (vs. 10-15 patients)

Larger site and patient pool increases likelihood of success Easier and more efficient to recruit More buy-in from sites and patients

Include in phase 1B or phase 2 study Increases chances of early identification of signals (e.g., treatment benefits,

impacts) Learn what is important to patients that may not be included in protocols Early signals can help inform future study design, PRO measurement

strategy, selection of other study endpoints, systematic measurement of new endpoints

12

Health Solutions



Exit Interview Study Examples

13




Example 1: Exit Interviews with COPD and Asthma

Patients in Prospective, Real World Clinical Studies RTI-HS designed and is implementing an exploratory study to capture

patient-centered information in the context of two real-world studies being conducted in chronic obstructive pulmonary disease (COPD) and asthma.

The study is investigating the impact and management of COPD and asthma from the patients perspective and highlighting the potentialrelationship between treatment and both behavioral and psychologicalfactors on patients experiences. Goal is to identify key risk factors for exacerbations and treatment adherence.

A mixed methods approach is being used: Quantitative data is being collected through the administration of structured,

closed-ended questions administered to all patients via telephone interviews. Qualitative data is also being collected through semi-structured, open-ended

questions on key topic areas administered to a subset of patients via face to face interviews.

14




Example 2: Interviews with Patients with Diabetic Gastroparesis Before and After Treatment RTI-HS recently collaborated with a pharmaceutical client developing a new

treatment for diabetic gastroparesis (DG) Participation in qualitative interviews at both the beginning (pre-treatment) and

end (post-treatment) of a phase 2 study was offered to all clinical trial participants Primary objective of the pre-treatment interviews was to inform the development

of a new PRO measure or modification of an existing PRO measure by: Identifying a comprehensive set of DG symptoms Learning how patients describe the burden and natural variation in these symptoms Understanding the relative bothersomeness of the symptoms Describing expectations related to successful treatment

Primary objective of the post-treatment interviews was to gather in-depth information about participants experience with the study drug, including the magnitude and relative importance of both positive and negative changes

A manuscript describing the methods and results of this study have just been submitted for publication

15




Example 3: Exit Interviews with Clinical Trial

Participants with Carcinoid Syndrome (CS) Task: Regulatory requirement that client assess and document the relevance and

clinical meaningfulness of specific CS-related symptoms and their impacts Designed and implemented a qualitative study to explore perceptions and

experiences of patients following their participation in a clinical trial. Conducted telephone exit interviews with 35 patients across 16 sites in 5 countries

enrolled in a phase 3 clinical trial investigating a new treatment for carcinoid syndrome to assess:

Participants experiences (symptoms and impacts) with their disease Perceived benefits of the study treatment The clinical meaningfulness of specific symptom improvements and their

associated impact to the patients

Mixed methods (qualitative and quantitative data) Data analyzed

Qualitative Quantitative Compared with selected clinical trial data

16




Example 3: Exit Interviews with Clinical Trial Participants with Carcinoid Syndrome (CS): Results Supported the primary endpoint of decrease in diarrhea The 3 most important symptoms to treat and the most bothersome

symptoms were diarrhea, BM frequency, and urgency. BM frequency was reported as being more important to treat than stool

form/consistency. Meaningfulness of changes with treatment 95% of participants who reported reductions in BM frequency noted that

this was meaningful to them, allowing them to better enjoy life, leave the house, and participate in social and other activities.

I definitely feel like I'm not a prisoner in my house, staying 10 feet to the nearest bathroom. I can go out to activities

But the biggest change is not having to run to the toilet constantlyYou can't live going 20 times a day. I was able to go out more often

Most participants reported that a BM frequency reduction of at least 30%would be considered meaningful.

17

-

Exploring the Use of

Anchor-Based Methods

to Derive and Interpret

Meaningful Within-

Patient Change

April 4, 2017

Gwaltney Consulting Confidential

-

ANCHOR-BASED METHODS

The anchor-based approaches use an external indicator, either clinical or patient-based, to assign subjects into several groupings reflecting no change, small positive changes, large positive changes, small negative changes, or large negative changes in clinical or health status (Revicki 2008; p. 104)

[Anchor-based methods] anchor change scores on the COA to an external criterion that identifies study subjects who have experienced an important change in their condition (PRO Consortium 2015)

Meaningful within-person change = Change on the target COA measure for patients who experience meaningful improvement or worsening on the anchor

Gold standard for estimating meaningful within-person change (FDA 2009)


-

TYPES OF ANCHORS

Global Impression Change Patient, Caregiver, Clinician Reported

Global Impression of Symptoms Patient Reported

Disease Severity Categories e.g., New York Heart Association Classification among

heart failure patients

Occurrence of a Meaningful Event e.g., Hospitalization, disease relapse

Experience of certain degree of change on a disease-related variable

e.g., Loss of 5% body fat in obese patients (Crosby 2003)


-

EXAMPLE: PGIC

Please choose the response below that best describes the

overall change in your

since you started taking the study medication.

Very much Better

Moderately Better

A Little Better

No Change

A Little Worse

Moderately Worse

Very much Worse


-

Dist~ibu~on of Change Scores by PGIIC C.at,egory

6 -Q,J .5

4 "ffi rJJ m m .,

2 - .5 0 c.. 0 -0

C w iii -G) -2 ~

0 0 Cl)

-4 C m

CL C 6 Q) C) C: -8 ra

.c u

-1,0 ---"""'""l!"- --,------......-----------n=-=""~---'

ESTIMATING WITHIN-PATIENT MEANINGFUL CHANGE

Farrar 2001; Pain


-

TYPES OF ANCHORS









-

EXAMPLE: PGIS

Please choose the response below that best describes the

severity of your over the

past week.

None

Mild

Moderate

Severe

Very Severe


-

TYPES OF ANCHORS









-

CONSIDERATIONS WHEN SELECTING

ANCHORS

Anchors should be easier to interpret than the PRO measure itself (FDA, 2009)

Correlation between anchor and target COA should be greater than 0.30-0.40 (Hays 2005; Revicki 2008)

Should anchor assess change in a specific symptom/function or a more global assessment of

health?

Recall bias with impression of change items

Most appropriate anchor type for different types of COAs?

Recommended to use multiple independent anchors and to examine and confirm responsiveness across

multiple samples (Revicki 2008)


http:0.30-0.40

-

CONSIDERATIONS WHEN USING


Type of analysis to determine meaningful change? Descriptive: Average COA score at each level of PGIC

Formal: Regression analysis, ROC curve

What level of change should be considered as the marker for meaningful change? Minimal?

Moderate? Large?

Only use estimate from group that has changed? Difference between changed and stable groups?

Non-linear relationship between anchor and COA score


-

Dist~ibu~on of Change Scores by PGIIC C.at,egory

6 -Q,J .5

4 "ffi rJJ m m .,

2 - .5 0 c.. 0 -0

C w iii -G) -2 ~

0 0 Cl)

-4 C m

CL C 6 Q) C) C: -8 ra

.c u

-1,0 ---"""'""l!"- --,------......-----------n=-=""~---'

NON-LINEAR RELATIONSHIP BETWEEN PGIC AND PRO

Farrar 2001; Pain


-

0 0 ....

0

=-'-~-1i--------+r 1: t,~ ~~ ...................... . I

' I

I 0 0

0

$ 0

0

g 0

0 0

0

v. much improved much improved mi1nlm. improved no change mi1nim. worse much worse v. much worse answer to OG.I question

IG. 1. Boxplot: ofabsolute change from baseline to week 12 in the number of moderate to _ever,e hot: flushe _ by answer to COi question. COi,. Clinical loba Impression; v.} very; minim.~ minimally; ab_.} absolute.

NON-LINEAR RELATIONSHIP BETWEEN PGIC AND PRO

Gerlinger 2012; Menopause Gwaltney Consulting Confidential

-

CONSIDERATIONS WHEN USING


Use of cross-sectional approaches? e.g., Difference between disease severity categories at

single point in time

Effect of unblinding on PGIC rating?

False sense of precision Clinical trials are less likely to acknowledge the error

associated with estimates

Different anchors can lead to substantially different findings How integrate findings?


-

REFERENCES

Crosby RD, Kolotkin RL, Williams GR. Defining clinically meaningful change in health-related quality of life. J Clin Epidemiol. 2003 May;56(5):395-407.

Farrar JT, Young JP Jr, LaMoreaux L, Werth JL, Poole RM. Clinical importance of changes in chronic pain intensity measured on an 11point numerical pain rating scale. Pain. 2001 Nov;94(2):149-58.

Gerlinger C, Gude K, Hiemeyer F, Schmelter T, Schfers M. An empirically validated responder definition for the reduction of moderate to severe hot flushes in postmenopausal women. Menopause. 2012 Jul;19(7):799-803

Hays RD, Farivar SS, Liu H. Approaches and recommendations for estimating minimally important differences for health-related quality of life measures. COPD. 2005 Mar;2(1):63-7.

PRO Consortium 2015. Interpreting Change in Scores on COA Endpoint Measures

Revicki D, Hays RD, Cella D, Sloan J. Recommended methods for determining responsiveness and minimally important differences for patient-reported outcomes. J Clin Epidemiol. 2008 Feb;61(2):102-9.





RTI-HS Exit interview overview slides_Duke-Margolis conference_31MAR2017.pdfClinical Trial Exit Interviews AcknowledgmentsWhat is an Exit Interview? Why Do Exit Interviews? Exit InterviewsSample Interview ConceptsPotential ApplicationsApproaches to Conducting Patient InterviewsIssues to Consider in OperationalizingPotential Methodological Considerations / Limitations Factors Contributing to a More Successful Interview StudyFactors Contributing to a More Successful Interview StudyExit Interview Study ExamplesExample 1: Exit Interviews with COPD and Asthma Patients in Prospective, Real World Clinical StudiesExample 2: Interviews with Patients with Diabetic Gastroparesis Before and After TreatmentExample 3: Exit Interviews with Clinical Trial Participants with Carcinoid Syndrome (CS)Example 3: Exit Interviews with Clinical Trial Participants with Carcinoid Syndrome (CS): Results

Documents

Clinical Outcome Assessments: Establishing and ... Outcome Assessments: Establishing and Interpreting Meaningful Within-Patient Change 4/4/17 Exploring the Use of Emerging Methods