Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Guideline development in TB diagnostics
Karen R Steingart, MD, MPH McGill University, Montreal, July 2012
Conflicts of interest
• I am an Editor with the Cochrane Infectious Diseases Group
• I am a member of the GRADE Working Group • I have no financial interests to declare
Overview
• What are guidelines?
• Quality of guidelines
• Institute of Medicine standards on guidelines
• The GRADE (Grades of Recommendation Assessment, Development and Evaluation) approach
Many factors enter into healthcare decisions…
• What alternatives are available? • What does the evidence suggest about their potential
benefits and harms? • How firm is the evidence? • Is there reason to adjust expectations based on a
particular patient’s age, gender, race, comorbidities, or other attributes?
• How might different patient preferences affect the best choice for a particular patient?
• Are there any social, economic, or other practical considerations that could affect the results of a particular care option?
Harvey V. Fineberg, MD, PhD, President, Institute of Medicine, February 2011
What are guidelines?
“Clinical practice guidelines are statements that include recommendations intended to optimize patient care that are informed by a systematic review of evidence and an assessment of benefits and harms of alternative care options .” Institute of Medicine, 2011
To be trustworthy, guidelines should
• be based on a systematic review of the existing evidence • be developed by a knowledgeable, multidisciplinary panel of
experts and representatives from key affected groups; consider important patient subgroups and patient preferences, as appropriate
• be based on an explicit and transparent process that minimizes distortions, biases, and conflicts of interest
• provide a clear explanation of the logical relationships between alternative care options and health outcomes, and provide ratings of both the quality of evidence and the strength of the recommendations
• be reconsidered and revised as appropriate when important new evidence warrants modifications of recommendations
http://www.iom.edu/Activities/Quality/ClinicPracGuide.aspx
Institute of Medicine standards for guidelines
1. Establishing transparency 2. Management of conflict of interest 3. Guideline development group composition 4. Clinical practice guideline-systematic review
intersection 5. Establishing evidence foundations for and rating
strength of recommendations 6. Articulation of recommendations 7. External review 8. Updating
http://www.iom.edu/Activities/Quality/ClinicPracGuide.aspx
Quality of guidelines – 1
Oxman AD et al. Lancet 2007
Quality of guidelines – 2
TB guidelines - mean scores in selected domains
36 guidelines published from January 1998 to May 2008 • Scope and purpose - 70% (range 22–100) • Stakeholder involvement – 27% (range 3– 86) • Rigour of development - 24% (range 6–95) • Clarity of presentation - 56% (range 28–97) • Applicability - 27% (range 0–93) • Editorial independence - 23% (range 0– 100)
Based on AGREE (Appraisal of Guidelines, Research and Evaluation) instrument
Gallardo CR et al. IJTLD 2010
Quality of guidelines – 3
Guidelines on interferon-gamma release assays…
• 33 guidelines from 25 countries and 2 supranational organizations were identified
• There was considerable diversity among IGRA guidelines, especially for children
• 78% guidelines cited systematic reviews of available data • 70% did not use objective and transparent grading
systems for guideline development • A majority of the guidelines did not include statements
on conflict of interest
Denkinger C, Dheda K, Pai M. Clin Microbiol Infect 2011
What are WHO guidelines?
• "Guidelines are recommendations intended to assist providers and recipients of health care and other stakeholders to make informed decisions. Recommendations may relate to clinical interventions, public health activities, or government policies."
WHO 2003, 2007
http://www.who.int/tb/advisory_bodies/research_to_policy/en/index.html
The Grading of Recommendations Assessment, Development and Evaluation
Guyatt GH et al. BMJ 2008
Schunemann HJ et al. BMJ 2008
The GRADE approach
• Used to create clinical practice guidelines • Based on a systematic and transparent assessment of
the evidence • GRADE is not a system for performing systematic reviews
and meta-analyses • GRADE separates the judgment on quality of evidence
from strength of recommendations • GRADEpro has been developed to summarize the
evidence and grade its quality http://ims.cochrane.org/revman/gradepro
• Additional information on GRADE www.gradeworkinggroup.org
Adapted from PLoS Med 7(8): e1000322. doi:10.1371/journal.pmed.1000322
GRADE Uptake World Health Organization Advisory Committee on Immunization Practices Allergic Rhinitis in Asthma Guidelines (ARIA) American Thoracic Society American College of Physicians European Respiratory Society British Medical Journal Infectious Disease Society of America American College of Chest Physicians UpToDate® National Institutes of Health and Clinical Excellence (NICE) Scottish Intercollegiate Guideline Network (SIGN) Cochrane Collaboration Infectious Disease Society of America Clinical Evidence Agency for Health Care Research and Quality (AHRQ) Partner of GIN Over 60 major organizations
“The move toward standardizing recommendations is expected to improve transparency, consistency, and communication in the health
care setting and between physicians and their patients.”
5/11/2012
http://www.forbes.com/sites/gerganakoleva/2012/05/11/revised-recommendations-for-vaccines-are-being-phased-in-cdc-report-says/
Evidence based healthcare decisions
Research evidence
Population values and preferences
(Clinical) state and circumstances
Expertise
Haynes et al. 2002
Guideline development
Process
Prioritise problems & scoping
Establish guideline panel and develop questions, including outcomes
Find and critically appraise systematic review(s)
and/or Prepare protocol(s) for systematic review(s)
and Prepare systematic review(s)
(searches, selection of studies, data collection and analysis)
Prepare an evidence profile
Assess the quality of evidence for each outcome
Prepare a Summary of Findings table
If developing guidelines:
Assess the overall quality of evidence and
Decide on the direction (which alternative) and strength of the recommendation
Draft guideline
Consult with stakeholders and/or external peer reviewers
Disseminate guidelines
Update review or guidelines when needed
Adapt guidelines, if needed
Prioritise guidelines/recommendations for implementation
Implement or support implementation of the guidelines
Evaluate the impact of the guidelines and implementation strategies
Update systematic review/guidelines
Systematic review
Guideline development
P I C O
Outcome
Outcome
Outcome
Outcome
Critical
Important
Critical
Low Summary of findings & estimate of effect for each outcome
Rate overall quality of evidence across outcomes based on
lowest quality of critical outcomes
RCT start high, obs. data start low
1. Risk of bias 2. Inconsistency 3. Indirectness 4. Imprecision 5. Publication
bias
Gra
de d
own
Gra
de u
p 1. Large effect 2. Dose
response 3. Confounders
Very low Low Moderate High
Formulate recommendations: • For or against (direction) • Strong or weak (strength)
By considering: Quality of evidence Balance benefits/harms Values and preferences
Revise if necessary by considering: Resource use (cost)
• “We recommend using…” • “We suggest using…” • “We recommend against using…” • “We suggest against using…”
QUALITY OF EVIDENCE
• There always is evidence “When there is a question there is evidence”
• Better research ⇒ greater confidence in the
evidence and decisions
Quality of evidence is about confidence
• In systematic reviews - Confidence that the point estimate is correct • In guidelines - Confidence that the estimate supports a
recommendation
Belief ≠ confidence
Confidence in the evidence using the GRADE approach likelihood of and confidence in an outcome
How to grade the quality of evidence
• Evidence varies from
⊕⊕⊕⊕/High
⊕⊕⊕/Moderate
⊕⊕/Low
⊕/Very low
Significance of the four levels of evidence
Quality level DEFINITION
High ⊕⊕⊕⊕ We are very confident that the true effect lies close to that of the estimate of the effect
Moderate ⊕⊕⊕
We are moderately confident in the effect estimate: The true effect is likely to be close to the estimate of the effect, but there is a possibility that it is substantially different
Low ⊕⊕ Our confidence in the effect estimate is limited: The true effect may be substantially different from the estimate of the effect
Very low ⊕
We have very little confidence in the effect estimate: The true effect is likely to be substantially different from the estimate of effect
H. Balshem et al. Journal of Clinical Epidemiology 2010
A simple hierarchy
STUDY DESIGN
Randomized Controlled Trials
Cohort Studies and Case Control Studies
Case Reports and Case Series, Non-systematic observations
Expert opinion
BIAS
BMJ VOLUME 327 20–27 DECEMBER 2003
BMJ 2003
Relative risk reduction: > 99.9 % 1/100,000) U.S. Parachute Association reported 821 injuries and 18 deaths out of 2.2 million jumps in 2007
Simple hierarchies are (too) simplistic
STUDY DESIGN
Randomized Controlled Trials
Cohort Studies and Case Control Studies
Case Reports and Case Series, Non-systematic observations
Expert opinion
BIAS Expert O
pinion
Study design for diagnostic accuracy reviews
• Randomized controlled trials, cross-sectional or cohort studies
• Patients with diagnostic uncertainty • Direct comparison of test results • Appropriate reference standard
• These studies are considered high quality and can
move to moderate, low or very low depending on other factors
Determinants of quality of evidence (downgrading)
5 factors can lower quality
1. Limitations in detailed design and execution (risk of bias criteria)
2. Indirectness (PICO and applicability) 3. Inconsistency (or heterogeneity) 4. Imprecision (number of events and confidence
intervals) 5. Publication bias (difficult to appraise in
systematic reviews of diagnostic test accuracy)
Determinants of quality of evidence (upgrading)
3 factors may increase quality
1. Large magnitude of effect 2. All plausible residual confounding may be working
to reduce the demonstrated effect or increase the effect if no effect was observed
3. Dose-response gradient
1. Limitations
QUADAS criteria (risk of bias) • Representativeness of the population that was
intended to be sampled • Blinded comparison with the best reference
standard or alternative strategy • All enrolled patients should receive the new test and
the reference standard • Diagnostic uncertainty should be given (TB suspects) • Is the reference standard likely to classify the target
condition?
2. Indirectness
The quality of evidence can be lowered if: • there are important differences between the populations
studied and those for whom the recommendation is intended (in prior testing, the spectrum of disease or co-morbidity)
• there are important differences in the tests studied and the diagnostic expertise of those applying them in the studies compared to the settings for which the recommendations are intended;
• the tests being compared are each compared to a reference (gold) standard in different studies and not directly compared in the same studies
• Also, in systematic reviews of diagnostic test accuracy, we may downgrade for ‘indirectness’ because diagnostic accuracy (e.g., sensitivity and specificity) is often used as a proxy for patient-important outcomes
3. Inconsistency
• The quality of evidence can be lowered if: there is unexplained inconsistency in sensitivity,
specificity or likelihood ratios can lower the quality of evidence.
GenoType MTBDR assays for the diagnosis of multidrug-resistant tuberculosis: a meta-analysis. Ling et al. Eur Respir J. 2008
Forest plot of sensitivity (a) and specificity (b) estimates for rifampicin resistance
Inconsistency?
Inconsistency?
Forest plots of sensitivity and specificity, anda-TB IgG for the diagnosis of extrapulmonary tuberculosis
Steingart, PLoS Med 2011
4. Imprecision
• Wide confidence intervals for estimates of test accuracy can lower the quality of evidence
The GenoType® MTBDRsI test detection of XDR-TB, indirect testing Pooled sensitivity 63.3% (95% CI 36.8, 83.5) Pooled specificity 98.5% (95% CI 96.0, 99.4)
,Steingart unpublished
5. Publication Bias - Underestimation or overestimation of study results due to selective publication of studies
• Diagnostic accuracy reviews do not allow for formal assessment of publication bias using methods such as funnel plots or regression tests because such techniques have not been found to be helpful for diagnostic test accuracy studies
• Unfortunately, it is very difficult to be confident that publication bias is absent, and almost equally difficult to know where to place the threshold and rate down for its likely presence.
• The terms GRADE suggests using are ‘‘undetected’’ and ‘‘strongly suspected”
Example of a GRADE Evidence Profile, MTBDRsl assay for detection of resistance to fluoroquinolones, indirect testing
Steingart KR unpublished
Footnotes A1 8/13 of studies used a cross-sectional design; 5/13 studies used a case-control design. A2 We assessed study limitations using the QUADAS 2 tool.
Example, GRADE Summary of Findings Table
• Review question: What is the diagnostic accuracy of MTBDRsl assay for detection of resistance to second-line anti-TB drugs?
• Patients/population: Confirmed TB cases • Setting: Clinical centers and laboratories • Index test: GenoType® MTBDRsl assay • Importance: Compared with conventional drug susceptibility
testing, genotypic methods, such as MTBDRsl assay, have considerable advantages for scaling up programmatic management and surveillance of drug-resistant TB, offering speed of diagnosis, standardized testing, potential for high throughput, and fewer requirements for laboratory bio-safety
• Reference standard: Conventional drug susceptibility testing by solid or liquid culture; some studies used DNA sequencing
• Studies: Cross-sectional, cohort, or case-control
Example calculation for determining the number of patients classified as TP,TN,FP,FN per 1,000 based on a pre-test
probability of 10% (based on a population with 10% prevalence of TB in the target population)
Adapted from Hsu J et al. Implement Sci 2011
Reference Standard Disease Present Disease Absent
Index Test Pos i tive TP = sensitivity x 100 FP = (1 – specificity) x 900 Negative FN = (1 – sensitivity) x 100 TN = specificity x 900
Prevalence: 10 % 100 900
49
STRENGTH OF RECOMMENDATIONS
Getting from evidence to recommendations - GRADE
• Recommendations are judgments: – Quality of evidence – Trade off between benefits and harms – Values and preferences – Resource use
52
Determinants of strength of recommendation
Factor Comment Balance between desirable and undesirable effects
The larger the difference between desirable and undesirable effects, the higher the likelihood that a strong recommendation is warranted
Quality of evidence The stronger the quality of evidence, the higher the likelihood that a strong recommendation is warranted
Values and preferences The more values and preferences vary, the higher the likelihood that a weak recommendation is warranted
Costs (resource allocation) The higher the costs, that is the greater the resources consumed, the lower the likelihood that a strong recommendation is warranted
Guyatt GH et al. BMJ 2008
Managing conflicts of interest
Place equal emphasis on intellectual and financial conflicts; provide explicit criteria Intellectual conflict of interest - academic activities that create the potential for an attachment to a specific point of view that could unduly affect an individual’s judgment about a specific recommendation - receipt of a grant - participation in research - commentary directly related to that recommendation
• Learning the GRADE process itself • Patient outcomes may not reflect the accuracy or benefit of a diagnostic test/approach because treatment is unavailable • The possible tension (for TB diagnosis and control) between the importance of individual patient outcomes and public health outcomes • Diagnostic RCTs are rarely available and hard to do (ethics, cost, etc.) • Grading may be done inconsistently across tests by different systematic reviewers; same evidence can be interpreted and rated differently
Challenges with GRADE
Systematic review
Guideline development
P I C O
Outcome
Outcome
Outcome
Outcome
Critical
Important
Critical
Not Summary of findings & estimate of effect for each outcome
Rate overall quality of evidence across outcomes based on
lowest quality of critical outcomes
RCT start high, obs. data start low
1. Risk of bias 2. Inconsistency 3. Indirectness 4. Imprecision 5. Publication
bias
Grad
e d
own
Grad
e u
p 1. Large effect 2. Dose
response 3. Confounders
Very low Low Moderate High
Formulate recommendations: • For or against (direction) • Strong or weak (strength)
By considering: Quality of evidence Balance benefits/harms Values and preferences
Revise if necessary by considering: Resource use (cost)
• “We recommend using…” • “We suggest using…” • “We recommend against using…” • “We suggest against using…”
http://cebgrade.mcmaster.ca/
Summary
• For key recommendations: – Search for and retrieve all available evidence – Identify relevant systematic reviews – Formally assess quality of evidence – GRADE (systematic and transparent approach)
– Transparently layout rationale for recommendations
– Manage conflicts of interest
http://cebgrade.mcmaster.ca/Summary/index.html
References
1. Barbui C, Dua T, van Ommeren M, Yasamy MT, Fleischmann A, et al. (2010) Challenges in developing evidence-based recommendations using the GRADE approach: the case of mental, neurological, and substance use disorders. PLoS Med 7.
2. Bastian H, Glasziou P, Chalmers (2010) Seventy-five trials and eleven systematic reviews a day: how will we ever keep up? PLoS Med 7: e1000326.
3. Denkinger CM, Dheda K, Pai M (2011) Guidelines on interferon-gamma release assays for tuberculosis infection: concordance, discordance or confusion? Clin Microbiol Infect 17: 806-814.
4. Gallardo CR, Rigau D, Irfan A, Ferrer A, Cayla JA, et al. (2010) Quality of tuberculosis guidelines: urgent need for improvement. Int J Tuberc Lung Dis 14: 1045-1051.
5. Guyatt GH, Oxman AD, Vist GE, Kunz R, Falck-Ytter Y, et al. (2008) GRADE: an emerging consensus on rating quality of evidence and strength of recommendations. BMJ 336: 924-926.
6. Hsu J, Brozek JL, Terraciano L, Kreis J, Compalati E, et al. (2011) Application of GRADE: Making Evidence-Based Recommendations about Diagnostic Tests in Clinical Practice Guidelines. Implement Sci 6: 62.
7. Oxman AD, Lavis JN, Fretheim A (2007) Use of evidence in WHO recommendations. World Hosp Health Serv 43: 14-20.
8. Pai M, Minion J, Steingart K, Ramsay A (2010) New and improved tuberculosis diagnostics: evidence, policy, practice, and impact. Curr Opin Pulm Med 16: 271-284.
9. Schünemann HJ, Oxman AD, Brozek J, Glasziou P, Jaeschke R, et al. (2008) Grading quality of evidence and strength of recommendations for diagnostic tests and strategies. BMJ 336: 1106-1110.
10. http://www.iom.edu/Activities/Quality/ClinicPracGuide.aspx 11. http://www.gradeworkinggroup.org/