57
RATING QUALITY OF EVIDENCE AND STRENGTH OF RECOMMENDATIONS IN HEPATOLOGY USING THE GRADE FRAMEWORK AASLD Practice Guidelines Committee Meeting, Chicago 1 May 2009 Yngve Falck-Ytter, M.D. Case Western Reserve University

AASLD Practice Guidelines Committee Meeting, Chicago 1 May 2009 Yngve Falck-Ytter, M.D. Case Western Reserve University

Embed Size (px)

Citation preview

RATING QUALITY OF EVIDENCE AND STRENGTH OF RECOMMENDATIONS IN

HEPATOLOGY USING THE GRADE FRAMEWORK

AASLD Practice Guidelines Committee Meeting, Chicago

1 May 2009

Yngve Falck-Ytter, M.D.Case Western Reserve University

Disclosure

In the past 5 years, Dr. Falck-Ytter received no

personal payments for services from industry. His

research group received research grants from

Three Rivers, Valeant and Roche that were

deposited into non-profit research accounts. He is a

member of the GRADE working group which has

received funding from various governmental

entities in the US and Europe. Some of the GRADE

work he has done is supported in part by grant # 1

R13 HS016880-01 from the Agency for Healthcare

Research and Quality (AHRQ).

Content

Part 1 Background and rationale for

revisiting guideline methodology

GRADE approach Quality of evidence Strength of recommendations

Content (continued)

Part 2 – practical consideration Ideal vs. practical ad hoc approaches Funding guideline work Creating GRADE evidence profiles

with GRADEpro GRADE and diagnostic tests

Reassessment of clinical practice guidelines

Editorial by Shaneyfelt and Centor (JAMA 2009) “Too many current guidelines have

become marketing and opinion-based pieces…”

“AHA CPG: 48% of recommendations are based on level C = expert opinion…”

“…clinicians do not use CPG […] greater concern […] some CPG are turned into performance measures…”

“Time has come for CPG development to again be centralized, e.g., AHQR…”

Evidence based clinical decisions

Research evidence

Patient values and preferences

Clinical state and circumstances

Expertise

Equal for allHaynes et al. 2002

Confidence in evidence

There always is evidence “When there is a question there is

evidence” Evidence alone is never sufficient to

make a clinical decision Better research greater confidence in

the evidence and decisions

Hierarchy of evidence

STUDY DESIGN Randomized Controlled

Trials Cohort Studies and

Case Control Studies Case Reports and Case

Series, Non-systematic observations

BIAS

Expert Opinion

Exp

ert O

pin

ion

Expert Opinion

Reasons for grading evidence?

People draw conclusions about the quality of evidence and strength of

recommendations

Systematic and explicit approaches can help to protect against errors, resolve disagreements communicate information and fulfill needs be transparent about the process

Change practitioner behavior However, wide variation in approaches

GRADE working group. BMJ. 2004 & 2008

10

Which grading system?

P: In patients with acute hepatitis C … I : Should anti-viral treatment be used … C: Compared to no treatment …O: To achieve viral clearance?Evidence Recommendation Organization

B Class I AASLD (2009)

VA (2006)II-1 -/-

SIGN (2006)1+ A

AGA (2006)-/- “Most authorities…”

Scenario (2)

Should patients with risk factors for viral hepatitis be screened with a hepatitis C antibody (ELISA) test to identify patients with past hepatitis C exposure?

13

Level of evidence in GI CPGsAASLD AGA ACG ASGE

A Multiple RCTs or meta-analysis

Good Consistent, well-designed, well conducted studies […]

1. Multiple published, well-controlled (?) randomized trials or a well designed systemic (?) meta-analysis

A. RCTs

B Single randomized trial, or non-randomized studies

C Only consensus opinion of experts, case studies, or standard-of-care

Fair Limited by the number, quality or consistency of individual studies […]

Poor … important flaws, gaps in chain of evidence…

2. One quality-published (?) RCT, published well-designed cohort/ case-control studies

3. Consensus of authoritative (?) expert opinions based on clinical evidence or from well designed, but uncontrolled or non-rand. clin. trials

B. RCT with important limitations

C. Obser-vational studies

D. Expert opinion

What to do?

14

Limitations of existing systems

Confuse quality of evidence with strength of recommendations

Lack well-articulated conceptual framework

Criteria not comprehensive or transparent

GRADE unique breadth, intensity of development process wide endorsement and use conceptual framework comprehensive, transparent criteria

Focus on all important outcomes related to a specific question and overall quality

Grades of Recommendation

Assessment, Development

and Evaluation

GRADE Working GroupDavid Atkins, chief medical officera Dana Best, assistant professorb Martin Eccles, professord Francoise Cluzeau, lecturerx

Yngve Falck-Ytter, associate directore Signe Flottorp, researcherf Gordon H Guyatt, professorg Robin T Harbour, quality and information director h Margaret C Haugh, methodologisti David Henry, professorj Suzanne Hill, senior lecturerj Roman Jaeschke, clinical professork Regina Kunx, Associate ProfessorGillian Leng, guidelines programme directorl Alessandro Liberati, professorm Nicola Magrini, directorn

James Mason, professord Philippa Middleton, honorary research fellowo Jacek Mrukowicz, executive directorp Dianne O’Connell, senior epidemiologistq Andrew D Oxman, directorf Bob Phillips, associate fellowr Holger J Schünemann, professorg,s Tessa Tan-Torres Edejer, medical officert David Tovey, Editory

Jane Thomas, Lecturer, UKHelena Varonen, associate editoru Gunn E Vist, researcherf John W Williams Jr, professorv Stephanie Zaza, project directorw

a) Agency for Healthcare Research and Quality, USA b) Children's National Medical Center, USAc) Centers for Disease Control and Prevention, USAd) University of Newcastle upon Tyne, UKe) German Cochrane Centre, Germanyf) Norwegian Centre for Health Services, Norwayg) McMaster University, Canadah) Scottish Intercollegiate Guidelines Network, UKi) Fédération Nationale des Centres de Lutte Contre le Cancer, Francej) University of Newcastle, Australiak) McMaster University, Canadal) National Institute for Clinical Excellence, UKm) Università di Modena e Reggio Emilia, Italyn) Centro per la Valutazione della Efficacia della Assistenza Sanitaria, Italyo) Australasian Cochrane Centre, Australia p) Polish Institute for Evidence Based Medicine, Polandq) The Cancer Council, Australiar) Centre for Evidence-based Medicine, UKs) National Cancer Institute, Italyt) World Health Organisation, Switzerland u) Finnish Medical Society Duodecim, Finland v) Duke University Medical Center, USA w) Centers for Disease Control and Prevention, USAx) University of London, UKY) BMJ Clinical Evidence, UK

GRADE uptake

Where GRADE fits inPrioritize problems, establish panel

Systematic review

Searches, selection of studies, data collection and analysis

Assess the relative importance of outcomes

Prepare evidence profile: Quality of evidence for each outcome and summary

of findingsAssess overall quality of evidence

Decide direction and strength of recommendation

Draft guideline

Consult with stakeholders and / or external peer reviewer

Disseminate guideline

Implement the guideline and evaluate

GR

AD

E

20

GRADE: Quality of evidence

The extent to which our confidence in an estimate of the treatment effect is adequate to support particular recommendation.

Although the degree of confidence is a continuum, we suggest using four categories:

High Moderate Low Very low

I B II V III

Quality of evidence across studies

Outcome #1Outcome #2Outcome #3

Quality: HighQuality: ModerateQuality: Low

Determinants of quality

RCTs start high

Observational studies start low

What lowers quality of evidence? 5 factors: Detailed design and execution Inconsistency of results Indirectness of evidence Imprecision Publication bias

23

What is the study design?

24

Types of studiesDid investigator assign exposure?

Experimental study

Yes

Observational study

No

Random allocation? Comparison group?

RCT

Yes

CCT

No

Analytical study

Yes

Case-series

No

Direction?Cohort study

Exposure Outcome

Case-control study

Exposure Outcome

Cross-sectional study

Exposure and outcome

at the same time

Before and after study

Variations:

cBAS

ITS

E O

1. Design and execution

Study limitations (risk of bias)For RCTs: Lack of allocation concealment No true intention to treat principle Inadequate blinding Loss to follow-up Early stopping for benefit

For observational studies: Selection Comparability Exposure/outcome

Avoid

critic

al ap

prais

al scoring

tools

!

Jadad AR et al. Control Clin Trials 1996 26

Tools: scales and checklists

Example: Jadad score

Was the study described as randomized?1

Adequate description of randomization? 1Double blind? 1

Method of double blinding described? 1Description of withdrawals and dropouts?

1

Max 5 points for quality

Schulz KF et al. JAMA 1995 27

Allocation concealment

250 RCTs out of 33 meta-analysesAllocation concealment:Effect

(Ratio of OR)

adequate 1.00 (Ref.)unclear 0.67 [0.60

– 0.75]not adequate 0.59

[0.48 – 0.73]

*

* significant

5 vs 4 chemo-Rx cycles for AML

Studies stopped early because of benefit

Cochrane Risk of bias graph in RevMan 5

30

2. Consistency of results

Look for explanation for inconsistency patients, intervention, comparator, outcome,

methods

Judgment variation in size of effect overlap in confidence intervals statistical significance of heterogeneity I2

Pagliaro L et al. Ann Intern Med 1992;117:59-70 32

Heterogeneity

3. Directness of Evidence

Indirect comparisons Interested in head-to-head comparison Drug A versus drug B Tenofovir versus entecavir in hepatitis B

treatment

Differences in patients (early cirrhosis vs end-stage cirrhosis) interventions (CRC screening: flex. sig. vs

colonoscopy) comparator (e.g., differences in dose) outcomes (non-steroidal safety: ulcer on

endoscopy vs symptomatic ulcer complications)

4. Imprecision

Small sample size small number of events wide confidence intervals uncertainty about magnitude of effect

Imprecision

0.75 1.00 1.25

RRappreciable

benefit appreciable harm

imprecise

precise

36

Control group event rate

Tota

l nu

mb

er

of

eve

nts

re

qu

ire

d

0.0 0.2 0.4 0.6 0.8 1.0

02

00

40

06

00

RRR=30%

RRR=25%

RRR=20%

300 events

5. Reporting Bias (Publication Bias)

Reporting of studies publication bias

number of small studies Reporting of outcomes

Egger M, Smith DS. BMJ 1995;310:752-54 38

I.V. Mg in acute myocardial infarction

Publication bias

Meta-analysisYusuf S.Circulation 1993

ISIS-4Lancet 1995

Egger M, Cochrane Colloquium Lyon 2001 39

Funnel plotS

tand

ard

Err

or

Odds ratio0.1 0.3 1 3

3

2

1

0

100.6

Symmetrical:No reporting bias

Egger M, Cochrane Colloquium Lyon 2001 40

Funnel plotS

tand

ard

Err

or

Odds ratio0.1 0.3 1 3

3

2

1

0

100.6

Asymmetrical:Reporting bias?

Egger M, Smith DS. BMJ 1995;310:752-54 41

I.V. Mg in acute myocardial infarction

Reporting bias

Meta-analysisYusuf S.Circulation 1993

ISIS-4Lancet 1995

42

Quality assessment criteria

Lower if…Quality of evidence

High (4)

Moderate (3)

Low (2)

Very low (1)

Study limitations(design and execution)

Inconsistency

Indirectness

Imprecision

Publication bias

Observational study

Study design

Randomized trial

Higher if…

What can raise the quality of evidence?

BMJ 2003;327:1459–61 43

44

Quality assessment criteria

Lower if… Higher if…Quality of evidence

High (4)

Moderate (3)

Low (2)

Very low (1)

Study design

Randomized trial

Observational study

Study limitations

Inconsistency

Indirectness

Imprecision

Publication bias

Large effect (e.g., RR 0.5)Very large effect (e.g., RR 0.2)

Evidence of dose-response gradient

All plausible confounding would reduce a demonstrated effect

45

Categories of quality

Further research is very unlikely to change our confidence in the estimate of effectHigh

LowFurther research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate

ModerateFurther research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate

Very low Any estimate of effect is very uncertain

46

Judgments about the overall quality of evidence Most systems not explicit

Options: Benefits Primary outcome Highest Lowest

Beyond the scope of a systematic review

GRADE: Based on lowest of all the critical outcomes

GRADE evidence profile

Going from evidence to recommendations

Deliberate separation of quality of evidence from strength of recommendation

No automatic one-to-one connection as in other grading systems

Example: What if there is high quality evidence, but the balance between benefit and risks are finely balanced?

48

Strength of recommendation

“The strength of a recommendation reflects the extent to which we can, across the range of patients for whom the recommendations are intended, be confident that desirable effects of a management strategy outweigh undesirable effects.”

Although the strength of recommendation is a continuum, we suggest using two categories :

“Strong” and “Weak”

Desirable and undesirable effects Desirable effects

Mortality reduction Improvement in quality of life, fewer

hospitalizations/infections Reduction in the burden of treatment Reduced resource expenditure

Undesirable effects Deleterious impact on morbidity, mortality or

quality of life, increased resource expenditure

4 determinants of the strength of recommendation

Factors that can weaken the strength of a recommendation

Explanation

Lower quality evidence The higher the quality of evidence, the more likely is a strong recommendation.

Uncertainty about the balance of benefits versus harms and burdens

The larger the difference between the desirable and undesirable consequences, the more likely a strong recommendation warranted. The smaller the net benefit and the lower certainty for that benefit, the more likely is a weak recommendation warranted.

Uncertainty or differences in values

The greater the variability in values and preferences, or uncertainty in values and preferences, the more likely weak recommendation warranted.

Uncertainty about whether the net benefits are worth the costs

The higher the costs of an intervention – that is, the more resources consumed – the less likely is a strong recommendation warranted.

Developing recommendations

Implications of a strong recommendation

Patients: Most people in this situation would want the recommended course of action and only a small proportion would not

Clinicians: Most patients should receive the recommended course of action

Policy makers: The recommendation can be adapted as a policy in most situations

Implications of a weak recommendation

Patients: The majority of people in this situation would want the recommended course of action, but many would not

Clinicians: Be prepared to help patients to make a decision that is consistent with their own values/decision aids and shared decision making

Policy makers: There is a need for substantial debate and involvement of stakeholders

6 main misconceptions

1. Isn’t GRADE expensive to realize?

2. Isn’t GRADE more complicated, takes longer and requires more resources?

3. Isn’t GRADE eliminating the expert?

4. But what about prevalence/burden of disease, diagnosis, cost?

5. But GRADE does not have an “insufficient evidence to make recommendation” category! (or: the “optional” category), no?

6. But we only “recommend” – we can’t possibly give weak recommendations!

Systematic review

Guideline development

PICO

OutcomeOutcomeOutcomeOutcome

Formulate

question

Rate

importa

nce

Critical

Important

Critical

Not important

Create

evidence

profile with

GRADEpro

Summary of findings & estimate of effect for each outcome

Rate overall quality of

evidence across outcomes based

on lowest quality of critical outcomes

Panel

RCT start high, obs. data start

low1. Risk of bias2. Inconsisten

cy3. Indirectnes

s4. Imprecision5. Publication

bias

Gra

de

dow

nG

rad

e

up

1. Large effect

2. Dose response

3. Confounders

Rate quality

of evidence

for each

outcomeSelect

outcomes

Very low

LowModerate

High

Formulate recommendations:

• For or against (direction)• Strong or weak (strength)

By considering: Quality of evidence Balance

benefits/harms Values and

preferences

Revise if necessary by considering:

Resource use (cost)

• “We recommend using…”• “We suggest using…”• “We recommend against using…”• “We suggest against using…”

Outcomes

across

studies

Conclusions

1. GRADE is gaining acceptance as international standard

2. GRADE has criteria for evidence assessment across questions (e.g., public health interventions) and outcomes

3. Criteria for moving from evidence to recommendations

4. Simple, transparent, systematic5. Balance between simplicity and

methodological rigor