Karen R Steingart, MD, MPH McGill University, Montreal ... · Oxman AD et al. Lancet 2007 . Quality of guidelines – 2 . TB guidelines - mean scores in selected domains ... • There

Guideline development in TB diagnostics

Karen R Steingart, MD, MPH McGill University, Montreal, July 2012

[email protected]

mailto:[email protected]

Conflicts of interest

• I am an Editor with the Cochrane Infectious Diseases Group

• I am a member of the GRADE Working Group • I have no financial interests to declare

Overview

• What are guidelines?

• Quality of guidelines

• Institute of Medicine standards on guidelines

• The GRADE (Grades of Recommendation Assessment, Development and Evaluation) approach

Many factors enter into healthcare decisions…

• What alternatives are available? • What does the evidence suggest about their potential

benefits and harms? • How firm is the evidence? • Is there reason to adjust expectations based on a

particular patient’s age, gender, race, comorbidities, or other attributes?

• How might different patient preferences affect the best choice for a particular patient?

• Are there any social, economic, or other practical considerations that could affect the results of a particular care option?

Harvey V. Fineberg, MD, PhD, President, Institute of Medicine, February 2011

What are guidelines?

“Clinical practice guidelines are statements that include recommendations intended to optimize patient care that are informed by a systematic review of evidence and an assessment of benefits and harms of alternative care options .” Institute of Medicine, 2011

To be trustworthy, guidelines should

• be based on a systematic review of the existing evidence • be developed by a knowledgeable, multidisciplinary panel of

experts and representatives from key affected groups; consider important patient subgroups and patient preferences, as appropriate

• be based on an explicit and transparent process that minimizes distortions, biases, and conflicts of interest

• provide a clear explanation of the logical relationships between alternative care options and health outcomes, and provide ratings of both the quality of evidence and the strength of the recommendations

• be reconsidered and revised as appropriate when important new evidence warrants modifications of recommendations

http://www.iom.edu/Activities/Quality/ClinicPracGuide.aspx


Institute of Medicine standards for guidelines

1. Establishing transparency 2. Management of conflict of interest 3. Guideline development group composition 4. Clinical practice guideline-systematic review

intersection 5. Establishing evidence foundations for and rating

strength of recommendations 6. Articulation of recommendations 7. External review 8. Updating



Quality of guidelines – 1

Oxman AD et al. Lancet 2007


TB guidelines - mean scores in selected domains

36 guidelines published from January 1998 to May 2008 • Scope and purpose - 70% (range 22–100) • Stakeholder involvement – 27% (range 3– 86) • Rigour of development - 24% (range 6–95) • Clarity of presentation - 56% (range 28–97) • Applicability - 27% (range 0–93) • Editorial independence - 23% (range 0– 100)

Based on AGREE (Appraisal of Guidelines, Research and Evaluation) instrument

Gallardo CR et al. IJTLD 2010


Guidelines on interferon-gamma release assays…

• 33 guidelines from 25 countries and 2 supranational organizations were identified

• There was considerable diversity among IGRA guidelines, especially for children

• 78% guidelines cited systematic reviews of available data • 70% did not use objective and transparent grading

systems for guideline development • A majority of the guidelines did not include statements

on conflict of interest

Denkinger C, Dheda K, Pai M. Clin Microbiol Infect 2011

What are WHO guidelines?

• "Guidelines are recommendations intended to assist providers and recipients of health care and other stakeholders to make informed decisions. Recommendations may relate to clinical interventions, public health activities, or government policies."

WHO 2003, 2007

http://www.who.int/tb/advisory_bodies/research_to_policy/en/index.html

http://www.who.int/tb/advisory_bodies/research_to_policy/en/index.html

The Grading of Recommendations Assessment, Development and Evaluation

Guyatt GH et al. BMJ 2008

Schunemann HJ et al. BMJ 2008

The GRADE approach

• Used to create clinical practice guidelines • Based on a systematic and transparent assessment of

the evidence • GRADE is not a system for performing systematic reviews

and meta-analyses • GRADE separates the judgment on quality of evidence

from strength of recommendations • GRADEpro has been developed to summarize the

evidence and grade its quality http://ims.cochrane.org/revman/gradepro

• Additional information on GRADE www.gradeworkinggroup.org

Adapted from PLoS Med 7(8): e1000322. doi:10.1371/journal.pmed.1000322

http://ims.cochrane.org/revman/gradepro

http://www.gradeworkinggroup.org/

GRADE Uptake World Health Organization Advisory Committee on Immunization Practices Allergic Rhinitis in Asthma Guidelines (ARIA) American Thoracic Society American College of Physicians European Respiratory Society British Medical Journal Infectious Disease Society of America American College of Chest Physicians UpToDate® National Institutes of Health and Clinical Excellence (NICE) Scottish Intercollegiate Guideline Network (SIGN) Cochrane Collaboration Infectious Disease Society of America Clinical Evidence Agency for Health Care Research and Quality (AHRQ) Partner of GIN Over 60 major organizations

“The move toward standardizing recommendations is expected to improve transparency, consistency, and communication in the health

care setting and between physicians and their patients.”

5/11/2012

http://www.forbes.com/sites/gerganakoleva/2012/05/11/revised-recommendations-for-vaccines-are-being-phased-in-cdc-report-says/

Evidence based healthcare decisions

Research evidence

Population values and preferences

(Clinical) state and circumstances

Expertise

Haynes et al. 2002

Guideline development

Process

Prioritise problems & scoping

Establish guideline panel and develop questions, including outcomes

Find and critically appraise systematic review(s)

and/or Prepare protocol(s) for systematic review(s)

and Prepare systematic review(s)

(searches, selection of studies, data collection and analysis)

Prepare an evidence profile

Assess the quality of evidence for each outcome

Prepare a Summary of Findings table

If developing guidelines:

Assess the overall quality of evidence and

Decide on the direction (which alternative) and strength of the recommendation

Draft guideline

Consult with stakeholders and/or external peer reviewers

Disseminate guidelines

Update review or guidelines when needed

Adapt guidelines, if needed

Prioritise guidelines/recommendations for implementation

Implement or support implementation of the guidelines

Evaluate the impact of the guidelines and implementation strategies

Update systematic review/guidelines

Systematic review


P I C O

Outcome

Outcome

Outcome

Outcome

Critical

Important

Critical

Low Summary of findings & estimate of effect for each outcome

Rate overall quality of evidence across outcomes based on

lowest quality of critical outcomes

RCT start high, obs. data start low

1. Risk of bias 2. Inconsistency 3. Indirectness 4. Imprecision 5. Publication

bias

Gra

de d

own

Gra

de u

p 1. Large effect 2. Dose

response 3. Confounders

Very low Low Moderate High

Formulate recommendations: • For or against (direction) • Strong or weak (strength)

By considering: Quality of evidence Balance benefits/harms Values and preferences

Revise if necessary by considering: Resource use (cost)

• “We recommend using…” • “We suggest using…” • “We recommend against using…” • “We suggest against using…”

QUALITY OF EVIDENCE

• There always is evidence “When there is a question there is evidence”

• Better research ⇒ greater confidence in the

evidence and decisions

Quality of evidence is about confidence

• In systematic reviews - Confidence that the point estimate is correct • In guidelines - Confidence that the estimate supports a

recommendation

Belief ≠ confidence

Confidence in the evidence using the GRADE approach likelihood of and confidence in an outcome

How to grade the quality of evidence

• Evidence varies from

⊕⊕⊕⊕/High

⊕⊕⊕/Moderate

⊕⊕/Low

⊕/Very low

Significance of the four levels of evidence

Quality level DEFINITION

High ⊕⊕⊕⊕ We are very confident that the true effect lies close to that of the estimate of the effect

Moderate ⊕⊕⊕

We are moderately confident in the effect estimate: The true effect is likely to be close to the estimate of the effect, but there is a possibility that it is substantially different

Low ⊕⊕ Our confidence in the effect estimate is limited: The true effect may be substantially different from the estimate of the effect

Very low ⊕

We have very little confidence in the effect estimate: The true effect is likely to be substantially different from the estimate of effect

H. Balshem et al. Journal of Clinical Epidemiology 2010

A simple hierarchy

STUDY DESIGN

Randomized Controlled Trials

Cohort Studies and Case Control Studies

Case Reports and Case Series, Non-systematic observations

Expert opinion

BIAS

BMJ VOLUME 327 20–27 DECEMBER 2003

BMJ 2003

Relative risk reduction: > 99.9 % 1/100,000) U.S. Parachute Association reported 821 injuries and 18 deaths out of 2.2 million jumps in 2007

Simple hierarchies are (too) simplistic

STUDY DESIGN

Randomized Controlled Trials

Cohort Studies and Case Control Studies

Case Reports and Case Series, Non-systematic observations

Expert opinion

BIAS Expert O

pinion

Study design for diagnostic accuracy reviews

• Randomized controlled trials, cross-sectional or cohort studies

• Patients with diagnostic uncertainty • Direct comparison of test results • Appropriate reference standard

• These studies are considered high quality and can

move to moderate, low or very low depending on other factors

Determinants of quality of evidence (downgrading)

5 factors can lower quality

1. Limitations in detailed design and execution (risk of bias criteria)

2. Indirectness (PICO and applicability) 3. Inconsistency (or heterogeneity) 4. Imprecision (number of events and confidence

intervals) 5. Publication bias (difficult to appraise in

systematic reviews of diagnostic test accuracy)

Determinants of quality of evidence (upgrading)

3 factors may increase quality

1. Large magnitude of effect 2. All plausible residual confounding may be working

to reduce the demonstrated effect or increase the effect if no effect was observed

3. Dose-response gradient

1. Limitations

QUADAS criteria (risk of bias) • Representativeness of the population that was

intended to be sampled • Blinded comparison with the best reference

standard or alternative strategy • All enrolled patients should receive the new test and

the reference standard • Diagnostic uncertainty should be given (TB suspects) • Is the reference standard likely to classify the target

condition?

2. Indirectness

The quality of evidence can be lowered if: • there are important differences between the populations

studied and those for whom the recommendation is intended (in prior testing, the spectrum of disease or co-morbidity)

• there are important differences in the tests studied and the diagnostic expertise of those applying them in the studies compared to the settings for which the recommendations are intended;

• the tests being compared are each compared to a reference (gold) standard in different studies and not directly compared in the same studies

• Also, in systematic reviews of diagnostic test accuracy, we may downgrade for ‘indirectness’ because diagnostic accuracy (e.g., sensitivity and specificity) is often used as a proxy for patient-important outcomes

3. Inconsistency

• The quality of evidence can be lowered if: there is unexplained inconsistency in sensitivity,

specificity or likelihood ratios can lower the quality of evidence.

GenoType MTBDR assays for the diagnosis of multidrug-resistant tuberculosis: a meta-analysis. Ling et al. Eur Respir J. 2008

Forest plot of sensitivity (a) and specificity (b) estimates for rifampicin resistance

Inconsistency?

Inconsistency?

Forest plots of sensitivity and specificity, anda-TB IgG for the diagnosis of extrapulmonary tuberculosis

Steingart, PLoS Med 2011

4. Imprecision

• Wide confidence intervals for estimates of test accuracy can lower the quality of evidence

The GenoType® MTBDRsI test detection of XDR-TB, indirect testing Pooled sensitivity 63.3% (95% CI 36.8, 83.5) Pooled specificity 98.5% (95% CI 96.0, 99.4)

,Steingart unpublished

5. Publication Bias - Underestimation or overestimation of study results due to selective publication of studies

• Diagnostic accuracy reviews do not allow for formal assessment of publication bias using methods such as funnel plots or regression tests because such techniques have not been found to be helpful for diagnostic test accuracy studies

• Unfortunately, it is very difficult to be confident that publication bias is absent, and almost equally difficult to know where to place the threshold and rate down for its likely presence.

• The terms GRADE suggests using are ‘‘undetected’’ and ‘‘strongly suspected”

Example of a GRADE Evidence Profile, MTBDRsl assay for detection of resistance to fluoroquinolones, indirect testing

Steingart KR unpublished

Footnotes A1 8/13 of studies used a cross-sectional design; 5/13 studies used a case-control design. A2 We assessed study limitations using the QUADAS 2 tool.

Example, GRADE Summary of Findings Table

• Review question: What is the diagnostic accuracy of MTBDRsl assay for detection of resistance to second-line anti-TB drugs?

• Patients/population: Confirmed TB cases • Setting: Clinical centers and laboratories • Index test: GenoType® MTBDRsl assay • Importance: Compared with conventional drug susceptibility

testing, genotypic methods, such as MTBDRsl assay, have considerable advantages for scaling up programmatic management and surveillance of drug-resistant TB, offering speed of diagnosis, standardized testing, potential for high throughput, and fewer requirements for laboratory bio-safety

• Reference standard: Conventional drug susceptibility testing by solid or liquid culture; some studies used DNA sequencing

• Studies: Cross-sectional, cohort, or case-control

Example calculation for determining the number of patients classified as TP,TN,FP,FN per 1,000 based on a pre-test

probability of 10% (based on a population with 10% prevalence of TB in the target population)

Adapted from Hsu J et al. Implement Sci 2011

Reference Standard Disease Present Disease Absent

Index Test Pos i tive TP = sensitivity x 100 FP = (1 – specificity) x 900 Negative FN = (1 – sensitivity) x 100 TN = specificity x 900

Prevalence: 10 % 100 900

49

STRENGTH OF RECOMMENDATIONS

Getting from evidence to recommendations - GRADE

• Recommendations are judgments: – Quality of evidence – Trade off between benefits and harms – Values and preferences – Resource use

52

Determinants of strength of recommendation

Factor Comment Balance between desirable and undesirable effects

The larger the difference between desirable and undesirable effects, the higher the likelihood that a strong recommendation is warranted

Quality of evidence The stronger the quality of evidence, the higher the likelihood that a strong recommendation is warranted

Values and preferences The more values and preferences vary, the higher the likelihood that a weak recommendation is warranted

Costs (resource allocation) The higher the costs, that is the greater the resources consumed, the lower the likelihood that a strong recommendation is warranted

Guyatt GH et al. BMJ 2008

Managing conflicts of interest

Place equal emphasis on intellectual and financial conflicts; provide explicit criteria Intellectual conflict of interest - academic activities that create the potential for an attachment to a specific point of view that could unduly affect an individual’s judgment about a specific recommendation - receipt of a grant - participation in research - commentary directly related to that recommendation

• Learning the GRADE process itself • Patient outcomes may not reflect the accuracy or benefit of a diagnostic test/approach because treatment is unavailable • The possible tension (for TB diagnosis and control) between the importance of individual patient outcomes and public health outcomes • Diagnostic RCTs are rarely available and hard to do (ethics, cost, etc.) • Grading may be done inconsistently across tests by different systematic reviewers; same evidence can be interpreted and rated differently

Challenges with GRADE

Systematic review


P I C O

Outcome

Outcome

Outcome

Outcome

Critical

Important

Critical

Not Summary of findings & estimate of effect for each outcome

Rate overall quality of evidence across outcomes based on

lowest quality of critical outcomes

RCT start high, obs. data start low

1. Risk of bias 2. Inconsistency 3. Indirectness 4. Imprecision 5. Publication

bias

Grad

e d

own

Grad

e u

p 1. Large effect 2. Dose

response 3. Confounders

Very low Low Moderate High

Formulate recommendations: • For or against (direction) • Strong or weak (strength)

By considering: Quality of evidence Balance benefits/harms Values and preferences

Revise if necessary by considering: Resource use (cost)

• “We recommend using…” • “We suggest using…” • “We recommend against using…” • “We suggest against using…”

http://cebgrade.mcmaster.ca/

Summary

• For key recommendations: – Search for and retrieve all available evidence – Identify relevant systematic reviews – Formally assess quality of evidence – GRADE (systematic and transparent approach)

– Transparently layout rationale for recommendations

– Manage conflicts of interest

http://cebgrade.mcmaster.ca/Summary/index.html

References

1. Barbui C, Dua T, van Ommeren M, Yasamy MT, Fleischmann A, et al. (2010) Challenges in developing evidence-based recommendations using the GRADE approach: the case of mental, neurological, and substance use disorders. PLoS Med 7.

2. Bastian H, Glasziou P, Chalmers (2010) Seventy-five trials and eleven systematic reviews a day: how will we ever keep up? PLoS Med 7: e1000326.

3. Denkinger CM, Dheda K, Pai M (2011) Guidelines on interferon-gamma release assays for tuberculosis infection: concordance, discordance or confusion? Clin Microbiol Infect 17: 806-814.

4. Gallardo CR, Rigau D, Irfan A, Ferrer A, Cayla JA, et al. (2010) Quality of tuberculosis guidelines: urgent need for improvement. Int J Tuberc Lung Dis 14: 1045-1051.

5. Guyatt GH, Oxman AD, Vist GE, Kunz R, Falck-Ytter Y, et al. (2008) GRADE: an emerging consensus on rating quality of evidence and strength of recommendations. BMJ 336: 924-926.

6. Hsu J, Brozek JL, Terraciano L, Kreis J, Compalati E, et al. (2011) Application of GRADE: Making Evidence-Based Recommendations about Diagnostic Tests in Clinical Practice Guidelines. Implement Sci 6: 62.

7. Oxman AD, Lavis JN, Fretheim A (2007) Use of evidence in WHO recommendations. World Hosp Health Serv 43: 14-20.

8. Pai M, Minion J, Steingart K, Ramsay A (2010) New and improved tuberculosis diagnostics: evidence, policy, practice, and impact. Curr Opin Pulm Med 16: 271-284.

9. Schünemann HJ, Oxman AD, Brozek J, Glasziou P, Jaeschke R, et al. (2008) Grading quality of evidence and strength of recommendations for diagnostic tests and strategies. BMJ 336: 1106-1110.

10. http://www.iom.edu/Activities/Quality/ClinicPracGuide.aspx 11. http://www.gradeworkinggroup.org/


http://www.gradeworkinggroup.org/

Documents

Karen R Steingart, MD, MPH McGill University, Montreal ... · Oxman AD et al. Lancet 2007 . Quality of guidelines – 2 . TB guidelines - mean scores in selected domains ... • There