Grading the quality of evidence Yngve Falck-Ytter Regina Kunz Holger Schünemann Utrecht, September 18, 2008 GRADE workshop Julius Centrum, UMC Utrecht

Grading the quality of evidence

Yngve Falck-YtterRegina Kunz

Holger SchünemannUtrecht, September 18, 2008

GRADE workshopJulius Centrum, UMC Utrecht

Content

Why the usual hierarchies to grade quality of evidence are problematic

How GRADE does it differently Why judgments are still required Why GRADEing quality of evidence is easier than

you might think

2

Oxford Centre of Evidence Based Medicine; http://www.cebm.net 3

Before GRADE

Level of evidence

I

II

III

IV

V

Source of evidence

SR, RCTs

Cohort studies

Case-control studies

Case series

Expert opinion

A

Grades of recomend.

B

C

D

4

From Levels to Grades

Decreased grades 1 or 2 steps - 7%

Increased grades from

C to A 27%

Increased grades from

B to A or C to B 21%

Increasedgrades 1 or 2 steps48%

Grades equal

to levels45%

Committee of Ministers of the Council of Europe. Oct 2001. 5

6

Grading used in GI CPGs

AASLD AGA ACG ASGE

I RCTs I RCTs, well designed, n↑ for suff. stat. power

I Syst. review of RCTs

A. Prospect. controlled trials

II-1 Controlled trials(no randomization)

II 1 large well-designed clinical trial (+/- rand.), cohort or case-control studies or well designed meta-analysis

II-3 Multiple time series, dramatic uncontr. experiments

III Opinion of respected authorities, descrip. epidemiology

II-2 Cohort or case-control analytical studies

III Clinical experience, descr. studies, expert comm.

IV Not rated

II 1+ properly desig. RCT, n↑, clinical setting

III Publ., well-desig. trials, pre-post, cohort, time series, case-control studies

IV Non-exp. studies >1 center/group, opinion respected authorities, clinical evidence, descr. studies, expert consensus comm.

B. Obser-vational studies

C. Expert opinion

Adapted from: Oxford Centre of Evidence Based Medicine; http://www.cebm.net

7

Level of evidence

Ia

II

III

IV

V

Oxford Centre of EBM

Systematic reviews (meta-analyses) of RCTs

Cohort studies

Case-control-studies

Case-series

Expert opinion

Levels of evidence

Ib Randomized controlled trials

BiasBiasBias

Bias

Bias

Bias

8

GRADEQuality of evidence

The extent to which one can be confident that an estimate of effect or association is correct.

Although the degree of confidence is a continuum, we suggest using four categories:

• High

• Moderate

• Low

• Very low

Quality of evidence across studies

9

I BIVIII AVIII

Quality: High

Outcom

e # 1

Outcom

e # 2

Outcom

e # 3

Quality: M

oderate

Quality: Low

Determinants of quality

RCTs start high

Observational studies start low

What lowers quality of evidence? 5 factors:• Detailed study design and execution

• Inconsistency

• Indirectness

• Publication bias

• Imprecision

11

What is the study design?

Design and Execution

Limitations• Lack of allocation concealment

• No true intention to treat principle

• Inadequate blinding

• Loss to follow-up

• Early stopping for benefit

Schulz KF et al. JAMA 1995 15

Allocation concealment

250 RCTs out of 33 meta-analysesAllocation concealment: Effect

(Ratio of OR)

adequate 1.00 (Ref.)

unclear 0.67 [0.60 – 0.75]

not adequate 0.59 [0.48 – 0.73]

*

* significant

Sackett, Gent. NEJM 1979 16

Bias

Fields et al 1970: RCT

• 167 pts with bilateral stenosis of the carotids+TIA

• Surgical vs medical management

• 151 pt analysed per protocol:

RRR (TIA, CVA, death ): 26% [6%, 42%], p = 0.01

• Outcome: pt had to be d/c‘ed alive and without TIA/CVA

• 15 in surg. group/1 in med. management excluded:

ITT: RRR (TIA, CVA, death): 17% [-3%, 32%], p = 0.09

Noseworthy et al. Neurology 1994 17

Another bias

MS: Plasmapheresis, Cyclophosphamide, Prednisone vs. Placebo

Follow up:

6 months

12 months

18 months

24 months

Neurologist:p values

< 0.05

< 0.005

NS

< 0.05

outcome assessment blinded

NS

NS

NS

NS

What is double blind?

Participants

Health care providers

Data collectors

Judicial assessors of outcomes

Data analyst

Data safety and monitoring committee

Manuscript writers

Bias through other effective interventions, differential reporting of symptoms, dropping out

Differentially prescribing effective co-interventions, influence compliance with follow-up, influence patient reports

Differential encouragement, timing/frequency of outcomes assessment, differential recording of outcomes

Differential assessment of outcome

Differential decisions on patient withdrawal, post hoc selection of outcomes or analytic approaches, selection of time points

Differential decisions to continue or stop the trial

May reduce biases in the presentation and interpretation of results

19

Quality issues

Sequ. generation

Allocation concealment

Blinding/Masking

Intention-to-treat analysis

Blinding/Masking

Baseline

Allocation

A B

Intervention No interv.

Follow up Follow up

Outcome Outcome

Method

Random?

Selectionbias?

Performance bias?

Attritionbias?

Detectionbias?

Question

Design and Execution

Limitations• Lack of allocation concealment

• No true intention to treat principle

• Inadequate blinding

• Loss to follow-up

• Early stopping for benefit

5 vs 4 chemo-Rx cycles for AML

Studies stopped early becasue of benefit


RCTs start high



• Inconsistency

• Indirectness


• Imprecision

Consistency of results

If inconsistency, look for explanation• patients, intervention, outcome, methods

How to analyze• Differences in effect size

• Overlap of confidence intervals

• Chi-square of homogeneity

• I-squared Unexplained inconsistency downgrade quality

Pagliaro L et al. Ann Intern Med 1992;117:59-70 25

Heterogeneity


RCTs start high



• Inconsistency

• Indirectness


• Imprecision

Directness of Evidence

Indirect comparisons• Interested in head-to-head comparison• Drug A versus drug B• Infliximab versus adalimumab in Crohn’s disease

Differences in• patients (early cirrhosis vs end-stage cirrhosis)• interventions (CRC screening: flex. sig. vs colonoscopy)• outcomes (non-steroidal safety: ulcer on endoscopy vs

symptomatic ulcer complications)


RCTs start high



• Inconsistency

• Indirectness


• Imprecision

Egger M, Smith DS. BMJ 1995;310:752-54 29

I.V. Mg in acute

myocardial infarction

Publication bias

Meta-analysisYusuf S.Circulation 1993

ISIS-4Lancet 1995

Egger M, Cochrane Colloquium Lyon 2001 30

Funnel plotS

tand

ard

Err

or

Odds ratio

0.1 0.3 1 3

3

2

1

0

100.6

Symmetrical:No reporting bias

Egger M, Cochrane Colloquium Lyon 2001 31

Funnel plotS

tand

ard

Err

or

Odds ratio

0.1 0.3 1 3

3

2

1

0

100.6

Asymmetrical:Reporting bias?

Egger M, Smith DS. BMJ 1995;310:752-54 32

I.V. Mg in acute

myocardial infarction

Reporting bias

Meta-analysisYusuf S.Circulation 1993

ISIS-4Lancet 1995


RCTs start high



• Inconsistency

• Indirectness


• Imprecision

Imprecision Small sample size Small number of events Wide confidence intervals Uncertainty about magnitude of effect Is this an example of imprecision?

• RCT: clopidogrel vs aspirin

• 19,185 patients at risk of vascular events

• Clopidogrel: 939 (5.32%) had major vascular event

• Aspirin: 1,021 (5.83%)

• RR of 0.91 (95% CI 0.83 – 0.99).

35

Control group event rate

Tota

l nu

mb

er

of

eve

nts

re

qu

ire

d

0.0 0.2 0.4 0.6 0.8 1.0

02

00

40

06

00

RRR=30%

RRR=25%

RRR=20%

300 events

36

Quality assessment criteriaQuality assessment criteria

Lower if…Quality of evidence

High (4)

Moderate (3)

Low (2)

Very low (1)

Study limitations(design and execution)

Inconsistency

Indirectness

Imprecision

Publication bias

Observational study

Study design

Randomized trial

Higher if…

What can raise the quality of evidence?

BMJ 2003;327:1459–61 37

38

Quality assessment criteriaQuality assessment criteria

Lower if… Higher if…Quality of evidence

High (4)

Moderate (3)

Low (2)

Very low (1)

Study design

Randomized trial

Observational study

Study limitations

Inconsistency

Indirectness

Imprecision

Publication bias

Large effect (e.g., RR 0.5)Very large effect (e.g., RR 0.2)

Evidence of dose-response gradient

All plausible confounding would reduce a demonstrated effect

39

Categories of quality

LowFurther research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate

Further research is very unlikely to change our confidence in the estimate of effectHigh

ModerateFurther research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate

Very low Any estimate of effect is very uncertain

40

Judgements about the overall quality of evidence

Most systems not explicit

Options:• Benefits

• Primary outcome

• Highest

• Lowest Beyond the scope of a systematic review GRADE: Based on lowest of all the critical outcomes

Practical points: Cochrane risk of bias tool

1. Adequate sequence generation?

2. Adequate allocation concealment?

3. Adequate blinding of participants, personnel, and outcome assessors? (assess each main outcome)

4. Incomplete outcomes data adequately addressed? (assess each main outcome)

5. Free of selective outcome reporting?

6. Free of other sources of bias?

Judgment: Yes (low risk of bias), No (high risk of bias), Unclear

41

Risk of bias graph in RevMan 5

42

43

From risk of bias to quality of evidence for main outcomes

Risk of bias Across studies Interpretation GRADEConsiderations

Low risk of bias

Most information is from studies at low risk of bias

Plausible bias unlikely to seriously alter the results

No serious limitations, do not downgrade

No apparent limitations

Unclear risk of bias

Most informa-tion is from studies at low or unclear risk of bias

Plausible bias that raises some doubt about the results

Potential limitations are unlikely to lower confidence in the estimate of effect

No serious limitations do not downgrade

Serious limitations, down-grade 1 level

Potential limitations are likely to lower confidence in the estimate of effect

Crucial limitation for one criterion, or some limitations for multiple criteria, sufficient to lower con-fidence in the estimate of effectCrucial limitation for one or more criteria sufficient to substantially lower confidence in the estimate of effect

Serious limitations, down-grade 1 level

Very serious limitations, down-grade 2 levels

Plausible bias that seriously weakens confidence in the results

The proportion of information from studies at high risk of bias is suffi-cient to affect the interpreta-tion of results

High risk of bias

Documents

Grading the quality of evidence Yngve Falck-Ytter Regina Kunz Holger Schünemann Utrecht, September 18, 2008 GRADE workshop Julius Centrum, UMC Utrecht