Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
June, 2009
The Methods for the Synthesis of Studies without Control Groups
Donna Fitzpatrick-Lewis • Donna Ciliska • Helen Thomas
2
3
The Methods for the Synthesis of Studies without Control Groups
Prepared for the National Collaborating Centre for Methods and Tools by
Donna Fitzpatrick-Lewis • Donna Ciliska • Helen Thomas
June, 2009
National Collaborating Centre for Methods and Tools (NCCMT)
School of Nursing, McMaster University
Suite 302, 1685 Main Street West
Hamilton, Ontario L8S 1G5
Telephone: (905) 525-9140, ext. 20455
Fax: (905 529-4184
Funded by the Public Health Agency of CanadaAffiliated with McMaster UniversityProduction of this paper has been made possible through a financial contribution from the Public Health Agency of Canada. The views expressed herein do not necessarily represent the views of the Public Health Agency of Canada.
4
5
Contents
Key Recommendations: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10Methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10Relevance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11Data Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12Critical Appraisal Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12Systematic Reviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17Methods for Synthesizing Quantitative Studies without Control Groups . . . . . . 23Combining Qualitative and Quantitative Studies . . . . . . . . . . . . . . . . . . . . . . . 26Studies Comparing Results Using Differing Methods . . . . . . . . . . . . . . . . . . . . 27
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Tables summarizing results of relevant papers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
6
Key Recommendations:1. Incorporate studies without control groups into systematic reviews, especially when
there are no other studies to consider. Studies without control groups can also pro-vide information on long-term effectiveness, rare events and adverse effects.
2. Do not generalize the direction of differences in outcome between randomized and non-controlled studies. While effect sizes are more often smaller in randomized controlled trials (RCTs), some show the reverse or the same results across study designs.
3. Use Whittemore and Knafl’s (2005) approach to include results of non-randomized trials. Whittemore and Knafl provide a five-step integrative review method that incorporates problem identification, literature search, data evaluation, data analysis and presentation of data.
4. Recommendations for methods of systematic review including studies without con-trol groups:
a) Use a critical appraisal tool that has been tested for internal consistency, test-retest and inter-rater reliability, and at least face and criterion validity (Downs & Black, 1998; Slim, Nini, Forestier et al., 2003; Steuten, Vrijhoef, van-Merode et al., 2004; Thomas, Ciliska, Dobbins et al., 2004a; Zaza, Wright-De Aguero, Briss et al., 2000).
b) Do not meta-analyse results from observational studies.5. Suggestions to authors of primary studies:
a) Ensure a strong and transparent study design in all non-randomized trials.b) Confirm that the research question determines the study design.c) Incorporate detailed information about the study; lack of information means
that readers cannot determine what has or has not been done in the study design.
d) Ensure consistency in terminology and descriptions across non-random-ized studies to facilitate comparisons.
6. Implications for further research:a) Conduct reliability and validity testing of any critical appraisal tools. b) Conduct further studies to assess the different results obtained by random-
ized versus non-randomized trials. 7. Including data from non-randomized trials is challenging; however, several benefits
can be achieved:a) A wider body of literature can be included in search strategies (Thomas,
Harden, Oakley et al., 2004).b) The scope of the questions that can be answered may be broadened. c) This research can shed light on why, how and for whom interventions or
programs succeed or fail (Oliver, Harden, Rees et al., 2005).
7
Summary
There is currently no standardized model for synthesizing results of studies that do not have control groups. The purpose of this paper is to examine synthesis methods, critical appraisal tools and studies that deal with appraising and synthesizing quantitative studies without control groups. RCTs are achievable in clinical settings, but public health interventions can rarely replicate the controlled environment of the clinic. Researchers, policy-makers and practice decision-makers often must rely on other types of study designs for their source of evidence. This paper will outline precautions to be aware of when including non-controlled studies as evidence, and recommends effective tools to use to analyse the results from these types of studies.
Methods
With the assistance of a librarian, five electronic databases were searched using keywords. In addition, reference lists of all relevant articles and grey literature were searched.
An article met the relevance criteria if:it was a primary study comparing results of the same intervention using randomized • controlled trials (RCTs) versus designs without control groups;it was a synthesis of methods used to review studies without control groups;• at least one of these designs was included: one group cohort (before/after or just • after), cross-sectional, interrupted time series or mixed methods; it was published between 1995 and November 2007.•
A paper was excluded if:the focus was RCT, clinical controlled trials or quasi-randomized studies;• the focus was a cohort analytic study (before/after on two or more groups);• the study was qualitative only;• it was published before 1995.•
Thirty-six papers passed the relevance test and full data extraction was completed on those papers. Many of the included studies were observational studies. It is important to note that while there are many definitions of observational studies, only those without control groups were included in this paper.
Findings
Fourteen articles provided critical appraisal tools that could assist in judging the method-ological rigour of primary studies. Primary studies for a systematic review should include a quality assessment of the studies. A critical appraisal tool is essential when sifting through evidence in primary studies without control groups. These tools can provide information about the strengths and weaknesses of information. Critical appraisal tools assist the reader in assessing the methodological quality, managing confounders, determining potential bi-ases and performing data analysis. These tools are particularly helpful where the criteria for
8
assessing RCTs are not applicable. Many of the tools have not been tested for validity and reliability; this testing might be a sensible next step.
Eleven systematic reviews incorporated primary studies without control groups, or examined critical appraisal tools for quality assessment of studies without control groups. The reviews that examined the feasibility of combining the results of RCTs and non-RCTs suggest that well-designed studies can produce similar results, regardless of the study type. Many of these reviews suggest that poor execution of the primary study design result in lower quality scoring, rather than the study design itself. They cited under-reporting of methodology, poor management of confounders and/or lack of consistent analytical terminology as problematic when trying to compare data between studies.
Six papers examined various methods of synthesizing and integrating quantitative data from studies without control groups. The authors agreed that there is benefit to incorporating evi-dence from diverse study designs, but quality assessment is paramount for providing trust-worthy evidence. Primary studies that are clear about potential bias and probable impact on effect can be usefully incorporated into systematic reviews.
Three articles described methods of synthesizing data from both qualitative and quantita-tive studies. Incorporating qualitative and quantitative data in one report can help build an understanding of the success or failure of interventions. Assessing the quality of studies without control groups can be problematic, especially if quality is assessed with traditional criteria used in systematic reviews or meta-analyses. Results of quality assessment were not used as exclusion criteria, but could be incorporated in the discussion of findings. Group-ing findings according to a thematic and aggregate method was offered as an appropriate and alternative approach to meta-analysis. Researchers could also incorporate some of the processes used in systematic reviews, such as having two people review and perform data extraction. Most of these authors suggested a narrative approach for presenting results when meta-analysis was not appropriate.
Two papers compared the results of different methodologies applied to the same interven-tion. Those articles pointed out that it is useful to consider different designs in reviews. Not only may they be the best available information, but they also can be more able to answer questions related to long-term effectiveness, adverse events and rare events. The strengths of including different designs in reviews outweigh the limitations.
Recommendations
1. Incorporate studies without control groups into systematic reviews, especially when there are no other studies to consider. Studies without control groups can also pro-vide information on long-term effectiveness, rare events and adverse effects.
2. Do not generalize the direction of differences in outcome between randomized and non-controlled studies. While effect sizes are more often smaller in randomized controlled trials (RCTs), some show the reverse or the same results across study designs.
3. Use Whittemore and Knafl’s (2005) approach to include results of non-randomized trials. Whittemore and Knafl provide a five-step integrative review method that
9
incorporates problem identification, literature search, data evaluation, data analysis and presentation of data.
4. Recommendations for methods of systematic review including studies without con-trol groups:
a) Use a critical appraisal tool that has been tested for internal consistency, test-retest and inter-rater reliability, and at least face and criterion validity (Downs & Black, 1998; Slim, Nini, Forestier et al., 2003; Steuten, Vrijhoef, van-Merode et al., 2004; Thomas, Ciliska, Dobbins et al., 2004a; Zaza, Wright-De Aguero, Briss et al., 2000).
b) Do not meta-analyse results from observational studies.5. Suggestions to authors of primary studies:
a) Ensure a strong and transparent study design in all non-randomized trials.b) Confirm that the research question determines the study design.c) Incorporate detailed information about the study; lack of information means
that readers cannot determine what has or has not been done in the study design.
d) Ensure consistency in terminology and descriptions across non-random-ized studies to facilitate comparisons.
6. Implications for further research:a) Conduct reliability and validity testing of any critical appraisal tools. b) Conduct further studies to assess the different results obtained by random-
ized versus non-randomized trials. 7. Including data from non-randomized trials is challenging; however, several benefits
can be achieved:a) A wider body of literature can be included in search strategies (Thomas,
Harden, Oakley et al., 2004).b) The scope of the questions that can be answered may be broadened. c) This research can shed light on why, how and for whom interventions or
programs succeed or fail (Oliver, Harden, Rees et al., 2005).
10
Introduction
The overriding goal of public health policy and practice is to protect people’s health. To achieve this goal, public health professionals consider evidence when making decisions. Historically, scientists have preferred the randomized controlled trial (RCT) study design to provide the least biased results when analysing the effectiveness of interventions. An RCT may provide rigorous study results for effectiveness questions; however, it might not be the best study design for other types of research questions (DiCenso, Prevost, Benefield et al., 2004). In public health and health promotion, it may be difficult to use randomized trials for a particular question; does that mean we should ignore evidence from other methods? Some researchers believe that only RCTs have the scientific rigour to produce valuable evidence.
In clinical settings, RCTs are achievable. But public health interventions can rarely replicate the controlled environment of the clinic. These interventions are often community-based and recruitment can be challenging, creating a selection bias. As a result, maintaining pure control groups without cross-contamination may be impossible or impractical. Public health researchers must often rely on other types of study designs, often classified as lower on the “hierarchy of evidence” (Ogilvie, Egan, Hamilton et al., 2005). However, observational stud-ies, for example, can provide valuable information for public health decision-making. Qualita-tive information can supplement statistical data by helping us understand the ‘who, why and how’ of intervention success or failure. There is a whole body of literature about the methods for synthesizing qualitative studies (meta-synthesis), but these methods are beyond the scope of this paper.
Due to the greater potential for bias in observational studies, they are often excluded from systematic reviews of treatments. However, they offer several advantages to RCTs, includ-ing: being less costly; allowing for larger sample sizes; and providing more long-term out-come measurements (Benson & Hartz, 2000). As well, observational studies may provide the “best available evidence” in certain situations (Ogilvie et al., 2005).
There is currently no standardized model for synthesizing the results of studies that do not have control groups. The purpose of this paper is to examine synthesis methods, criti-cal appraisal tools and studies that deal with appraising and synthesizing the results from quantitative studies that lack control groups. This paper will also outline some precautions to take when using non-controlled studies as evidence, and will make recommendations for processes and tools.
Methods
The methods used to collect the literature for this paper included:a comprehensive literature search of published literature from January 1995 to • October 2007;review and retrieval of references from relevant articles;• a search and retrieval of potentially relevant grey literature.•
The initial search was conducted by a skilled librarian, using the following key search words:
11
review, systematic review, literature review, meta-analysis, not random and not randomized clinical trials. These key word searches were conducted in a several databases: PsycINFO, OVID MEDLINE, EMBASE, CINAHL and Scholars Portal. The initial search located 9324 potentially relevant articles. Titles and abstracts were read independently by two reviewers who passed on 182 articles for full relevance testing.
Relevance
An article was included for full relevance testing if:it was a primary study comparing results of the same intervention using randomized • trials versus designs without control groups;it was a synthesis of methods used to review studies without control groups;• at least one of these designs was included: one group cohort (before/after or just • after), cross-sectional, interrupted time series or mixed methods; it was published between 1995 and November 2007.•
A paper was excluded if:the focus was RCT, clinical controlled trials or quasi-randomized studies;• the focus was a cohort analytic study (before/after on two or more groups);• the study was qualitative only;• it was published before 1995. •
Thirty-six papers passed the relevance test, and researchers completed full data extrac-tion on those papers. Many of the final papers were observational studies. There are many definitions for observational studies; for this paper, only observational studies with no control groups were considered relevant.
Data Synthesis
Researchers synthesized 36 papers in a narrative format by topic:critical appraisal tools• systematic reviews• methods for synthesizing quantitative studies without control groups• combined qualitative and quantitative studies• comparing results using differing methods•
Some papers could have been included in more than one section; however, we placed the papers in the groups that appeared most suitable.
The paper provides a narrative synopsis of the relevant papers. The results are also detailed in table format (See Appendix 1).
12
Findings
Critical Appraisal Tools
A total of 14 papers described critical appraisal tools.
Rangel, Kelsey, Colby et al., (2003) addressed the lack of a standardized measurement tool to determine the quality of observational studies. They developed a quality assessment in-strument for non-randomized studies that incorporated 30 items within three subscales. The subscales assessed clinical relevance, reporting methodology and the strength of conclu-sions. Global ratings were determined by combining each subscale, and the studies were thereby rated as ‘good,’ ‘fair’ or ‘poor.’ They achieved high inter-rater reliability with levels of agreement reaching 84.6% concordance of results (n = 1573 items). For all individual subscales, there was range of 73.3% to 85.8% (n = 60 to 1258). The potential applications for this tool include:
facilitating the development of standardized methodology for conducting systematic • reviews of retrospective data; helping to develop a strategy for meta-analysis of non-randomized trials; • developing standardized reporting guidelines for use in peer-reviewed journals.•
Margetts et al. (1995) developed a scoring system to judge the scientific quality of obser-vational epidemiological studies linking diet with the risk of cancer. Case-control and cohort studies were reviewed separately because some of the reliability markers applied to only one of these study types. Case-control studies were rated on three broad areas: quality of dietary assessment; recruitment of subject; and the analysis of results. Cohort studies were scored on four areas: dietary assessment, definition of cohort, ascertainment of disease cases and analysis of results. Inter-rater reliability was assessed and resulted in a high level of agreement between reviewers, with the cohort reviewers attaining a slightly higher level of agreement. This prototype scoring system helped the authors to describe the epidemiologi-cal data on diet and cancer; however it may not be generalizable to other topics.
Downs and Black (1998) developed and tested a checklist to assess randomized and non-randomized studies. They used epidemiological principles, reviews of study designs and existing checklists to develop their checklist, consisting of 26 items distributed between five subscales:
reporting• external validity• bias• confounding• power•
The maximum score was 31. Twenty-three of the questions could be asked of any study of any health care intervention. The other three questions were topic sensitive and were cus-tomized to provide the raters with information on confounders, main outcomes and the sam-
13
ple size required for clinical and statistical significance (p<0.05). The quality index had high internal consistency, good test-retest and inter-rater reliability and good face and criterion validity. It performed as well as other established checklists on randomized trials, and there was little difference between its performance with non-randomized and randomized studies.
To address the lack of validated instruments to measure quality of observational or non-ran-domized trials, Slim et al. (2003) tested a methodological index for non-randomized studies (MINORS). This tool includes 12 items (the first eight items were designed specifically for non-comparison trials):
a clearly stated aim• inclusion of consecutive patients• prospective collection of data• endpoints appropriate to the aim of the study• unbiased assessment of the study endpoints• a follow-up period appropriate to the aim of the study• loss to follow-up less than 5%• prospective calculation of the study size• an adequate control group• contemporary groups• baseline equivalence of groups• adequate statistical analysis•
This tool had good reliability, internal consistency and validity.
Zaza et al. (2000) developed a format to classify and provide descriptive components of public health inventions and assess the quality of the study’s execution. The authors ac-knowledged that all study designs have issues particular to their specific design, therefore the quality questions were developed to reflect those specific issues. For example, the authors suggested that while blinding is an important component to consider for randomized trials, it is not appropriate to assess validity in a time series design. Zaza et al. provided cat-egories of questions to assess potential threats to the validity of results in primary studies, including:
description of the invention• sampling• reliability and validity of outcome measurements• the appropriateness of the statistical analysis• the interpretation of results•
One aim of their approach was to design a tool that was flexible enough to be useful in the assessment of divergent study designs and interventions. The standardized abstraction form and procedure improved the validity and reliability of the Guide to Community Prevention Services: Systematic Reviews and Evidence-Based Recommendations.
14
Greer et al. (2000) presented and discussed a system of grading evidence that was devel-oped by the Institute for Clinical Systems Improvement (ICSI). This approach was developed in response to feedback indicating that an existing system was not working. Its goal is to as-sist busy practitioners who need to judge the quality of evidence. After reviewing many other grading systems, the ICSI working group agreed that it was important to separate the evalu-ation of the individual research reports from the assessment of the totality of the evidence supporting the conclusion. The subsequent worksheet for grading evidence—the primary tool—included a statement of conclusion, a summary of the research reports that support or dispute the conclusion, assignment of classes and quality markers to the research reports and assignment of a grade to the conclusion. Primary reports of new data collection were assigned a letter grade of A to D, depending on the type of study: randomized controlled trial; cohort study; non-randomized trial with concurrent or historic controls; case-control study; study of sensitivity and specificity of a diagnostic test; population-based descriptive study; cross-sectional study; case series; or case report. The report also included a five-point quality system to rate both positive and negative study design attributes, with ratings of plus, negative or neutral. The questions included:
a priori inclusion/exclusion criteria• bias• statistical or clinically significant results• generalizability of results to other populations• clearly outlined study design•
This system has been applied to more than 40 ICSI guidelines and technology assessment reports. The authors reported that the system appears to be successful in reducing the complexity of other grading systems, while yielding defendable classification of conclusions based on the strength of the underlying evidence.
Steuten et al. (2004) critiqued the tools used to assess the methodological quality of the Health Technology Assessment (HTA) of disease management. The authors developed an inventory of problems that arise when assessing HTA studies, including: study design (RCT vs. observational); criteria for selection and restriction of patients; baseline and outcome measures; blinding of patients and providers; the description of complex, multifaceted inter-ventions; and avoidance of co-interventions. They developed, proposed and validated a new instrument for assessing the methodological quality of HTA for disease management that includes four components:
study population• description of intervention• measurement of outcomes• data analysis/presentation of data•
The instrument includes 15 items that cover internal validity, external validity and statistical considerations. This instrument is reliable in terms of hierarchical ranking, test-retest reliabil-ity and inter-rater reliability when applied to HTAs of disease management.
Two reports (Chou & Helfand, 2005; McIntosh, Woolacott, & Bagnall, 2004) suggested
15
that evidence of harm is as important as evidence of effect. However, standard systematic reviews focusing on effectiveness and randomized trials may not be an efficient model for evaluating harm. Literature related to harm is found in unpublished trials, observational stud-ies and grey literature. Applying existing quality checklists to this data set was problematic and did not readily capture the types of information pertinent to harmful effects. McIntosh et al. adapted published checklists to reflect information about harmful effects found in obser-vational cohort studies. This new checklist also incorporated items such as how and when events were reported, and whether the time at which they occurred during the study was recorded. However, the greatest barrier to applying any checklist was the lack of method-ological detail provided by the primary study authors. The researchers also included narra-tive outcome reports. Chou and Helfand (2005) examined case reports and observational studies of harmful effects of treatment. They too found quality assessment difficult, as instru-ments for assessing these types of studies were rare or inconsistent. Developmental rigour, scope, and the number and type of items used were all contentious issues. They pilot-tested a quality assessment tool for RCTs, clinical control trials (CCTs) and cohort studies for an uncontrolled surgical series that reported complications from carotid endarterectomy. This eight-point tool provided a ranking of ‘good,’ ‘fair’ or ‘poor.’ The tool was tested on studies for one intervention. The authors cautioned researchers to avoid inappropriate pooling of statis-tical results from observational studies, given the potential for bias due to inadequate control of confounders and selection bias.
The GRADE Working Group (Atkins, Briss, Eccles et al., 2005) reported on the development and pilot testing of the GRADE approach to assessing evidence and recommendations. Twelve evidence profiles were independently graded by 17 judges who all had experience using other approaches to assessing evidence and recommendations. Each evidence profile was based on information available in a systematic review and included two tables, one for quality assessment and one for the summary of the findings. The quality assessment process was designed so that each outcome was evaluated separately. The outcome table displayed the number of studies that had reported that outcome, the study design (RCTs or observational) and the quality of the studies. Assessment of the quality of outcome revealed that factors other than study design, quality, consistency and directness affected the review-er’s judgment about quality. These other factors included: sparse data, strong associations, publication bias, dose response and situations where all plausible confounders strengthened rather than weakened confidence in the direction of effect. The judges were asked about the ease of understanding and the sensibility of the approach; they generally agreed that the GRADE approach was clear, understandable and sensible.
Khan et al. (2001) provided information on what needs to be included in any quality assess-ment checklist. In effectiveness trials, RCTs should be considered first when they are avail-able and well-designed. In the absence of good RCTs, well-designed quasi-experimental and observational studies should be considered. Khan et al. provided several questions to consider when assessing the quality of observational studies. They cautioned that although conclusions can be drawn from these studies, it may be unclear whether the groups within studies are comparable. The authors suggested that results need to be reported with cau-tion, and often with recommendations for further research.
Thomas, Ciliska, Dobbins and Micucci (2004) used a quality assessment instrument that
16
allowed for the methodological rating of studies that were not randomized control trials, for inclusion in a systematic review. This method was specifically developed for quality assess-ment of public health research. Quality assessment components included:
selection bias• study design• confounders• blinding• data collection methods• withdrawals or dropouts•
Studies rated with this system achieved a ‘strong,’ ‘moderate’ or ‘weak’ methodological rat-ing. The instrument had good inter-rater and test-retest reliability. It also had proven content and criterion validity, randomized control trials and clinical control trials were rated as strong and non-RCTs such as cohorts and observational studies were rated as moderate. However, non-randomized studies could achieve an overall study rating of ‘moderate’ if the other com-ponents of the study were strong. Data is extracted from studies and reported in a narrative format. Meta-analysis was rarely done as it was found that public health study populations and interventions are often too heterogeneous for this type of analysis. Data extraction al-lowed for reporting statistically significant (or lack of) effects from each study, and the re-searchers commented on the clinical meaningfulness of any statistically significant effect.
Ramsay et al. (2003) reviewed the quality of Interrupted Time Series (ITS) using studies included in two systematic reviews. They developed a quality assessment tool that incorpo-rated the following criteria:
the intervention occurred independently of other changes over time;• the intervention was unlikely to affect data collection; • the outcome was assessed blindly or measured objectively;• the outcome was reliable or measured objectively;• the composition of the data set at each time point covered at least 80% of all par-• ticipants in the study;the shape of the intervention effect was pre-specified;• the rationale for the number and spacing of data points was described;• the study was analysed appropriately using time series techniques. •
They found that 37 of the 58 studies were not analysed appropriately, and that 33 of those should be re-analysed. Poor study designs, insufficient power and inappropriate analysis of ITS studies resulted in misleading conclusions being presented in the published literature. The authors recommend that all data from ITS studies be re-analysed before inclusion in reviews.
Conn and Rantz (2003) explored ways in which researchers can manage quality when considering primary studies within a meta-analysis. While there are more than 100 scales available to measure the quality of these studies, they vary in size, composition, complexity and extent of development. Establishing the validity of the scales is complicated and chal-lenging. Scales are not reliable and accurate for all areas of science, and application of the
17
scales can be problematic and inconsistent. Conn and Rantz outlined three methods that could potentially manage quality:
setting a quality threshold• weighting by quality scoring• considering quality as an empirical question•
However, as a stand-alone method, each has limitations. For instance, setting a quality threshold may result in the exclusion of less rigorous research—data that may contain im-portant information. Weighting by study quality can be limited by the scaling tools used and some potentially inherent problems such as inter-rater reliability. Finally, considering quality as an empirical question may lead to overall measures masking interesting effects of indi-vidual components of quality on effect-size estimates. To accommodate for these limitations, researchers may need to combine strategies to include studies that are rated as method-ologically weak but still contain important information.
Section Summary
Reviewing literature to be included in a systematic review should include a quality assess-ment. A critical appraisal tool is essential when sifting through evidence in primary studies without control groups. These tools can provide information regarding the strengths and weaknesses of information. Critical appraisal tools assist the reader in assessing the evi-dence provided based on methodological quality, management of confounders, potential biases and data analysis. These tools are particularly helpful where systems of assessment used for RCTs are not applicable. Many of the tools presented here have not been tested for validity and reliability. This is a reasonable next step.
Systematic Reviews
Eleven systematic reviews considered the question of synthesizing data from studies without control groups.
Linde, Scholz, Melchart and Willich (2002) examined whether systematic reviews should include both randomized and non-randomized studies. The researchers examined the data on the use of acupuncture for chronic headaches to explore the following questions:
1. Do randomized and non-randomized studies of acupuncture for chronic headaches differ in regard to patients, interventions, design-independent quality aspects and response rates?
2. Do non-randomized studies provide relevant additional information (results of long-term outcomes, prognostic factors, adverse effects or complications and response rates in representative and well-defined groups of patients)?
3. If response rates in randomized and non-randomized patients differ, what are the possible explanations?
Fifty-nine studies met the inclusion criteria, of which 24 were randomized trials and 35 were non-randomized trials (five non-randomized control trials, 10 prospective uncontrolled stud-
18
ies, 10 case series and 10 cross-sectional surveys). A total of 535 patients received acu-puncture treatment in the 24 RCTs, compared with 2695 in the 35 non-randomized studies. On average, randomized trials had smaller sample sizes, met more quality criteria and had lower response rates to treatment (0.59; 0.40–0.69) vs. (0.78; 0.72–0.83). Regardless of randomization status, studies meeting more quality criteria had lower response to treatment rates. Follow-up time was not significantly greater in the non-randomized studies. In the case of acupuncture for chronic headaches, non-randomized studies confirmed the findings in a previous study (Greenhalgh, Robert, Macfarlane et al., 2005) that the treatment was likely to be effective. However, non-randomized studies provided little additional relevant in-formation on long-term effects, prognostic factors or adverse effects. In general, the authors concluded that non-randomized studies of good quality yielded results similar to RCTs. This suggests that their inclusion in systematic reviews, while increasing the workload for authors of these reviews, may provide a useful extension for generalizability.
MacLehose et al. (2000) investigated the association between methodological quality and estimates of effectiveness by comparing RCTs and quasi-experimental and observational (QEO) studies. The authors employed two strategies when reviewing the literature:
1. comparing RCT and QEO study estimates of effectiveness of any intervention, where both estimates were reported in one paper;
2. comparing the RCT and QEO estimates of effectiveness for specified interventions, where the estimates were reported in different papers.
For strategy 1, the authors identified 14 papers containing 38 comparisons, of which 13 were classified as high quality and 25 were classified as low. Quality assessment criteria included:
study estimates derived from the same population;• blinding of outcome assessors;• the extent to which the QEO study estimate took possible confounding into ac-• count.
The findings indicated that the discrepancies between RCT and QEO study estimates of effect size and outcome frequency for intervention and control groups were smaller for high-quality than for low-quality comparisons. In addition, there was no tendency for QEO study estimates of effect size to be more extreme for high-quality comparisons than low-quality RCTs. For strategy 2, the specific interventions included mammography screening to re-duce breast cancer mortality and folic acid supplements to reduce neutral tube defects. The authors identified 34 papers, of which 12 papers were RCTs, 11 were non-randomized trials or cohort studies and nine were matched or unmatched case-control studies. These studies were assessed based on
the quality of the reporting;• the generalizability of the results;• the extent to which estimates of effectiveness may have been subject to bias or • confounding.
Cohort and case-control studies had lower quality scores than RCTs, and cohort studies had
19
lower quality scores than case-control studies. Meta-regression of study attributes against relative risk estimates showed no association between effect size and study quality. Esti-mates for RCTs and cohort studies were not significantly different; however, case-control studies gave significantly different estimates in the mammography studies (greater benefit) and the folic acid studies (less benefit).
Britton et al. (1998) explored issues related to the process of randomization that may af-fect the validity of conclusions drawn from the results of RCTs and non-randomized studies (including studies without control groups). The authors asked four research questions to examine the issues of randomization and validity:
1. Do non-randomized studies differ systemically from RCTs in terms of treatment ef-fect?
2. Are there systematic differences between the included and excluded individuals, and do these influence the measured treatment effect?
3. To what extent is it possible to adjust for baseline differences between study groups?
4. How important is patient preference in terms of outcomes?
Their study included 18 papers that directly compared the results from RCTs and non-ran-domized studies. The results found no consistent reporting of larger or smaller estimates of treatment effect based on study design. The authors also reported that the intervention type did not seem to be influential; however, they cautioned that more study is needed to vali-date that inference. In RCTs, blanket exclusions were common, and the number of included eligible subjects ranged from 1% to 100%. Large clinical databases containing detailed information of patient severity and prognosis were used instead of RCTs. Where the da-tabase subjects were selected according to the same inclusion criteria as RCTs, the treat-ment effects of the two designs were similar. Documentation of the characteristics of eligible individuals who did not participate in the trials was poorly recorded in most RCTs. Treatment effect measures in RCTs may be exaggerated due to the larger participation of university and teaching centres as compared to non-randomized studies. In non-randomized stud-ies, adjustment for differences often changed the treatment effect size, but not significantly; more importantly, the direction of the change was consistent. Only four papers addressed the role of patient preference on results. In those papers, preference accounted for some of the observed differences between study designs. Britton et al. concluded:
a well-designed non-randomized study is preferable to a small, poorly designed • and exclusive RCT;RCTs should include a wide range of practice settings; • study populations should be representative of all patients receiving the intervention; • exclusions for administrative convenience should be rejected; • differences should be minimized by ensuring that subjects in both kinds of study • are comparable.
Lemmer, Grellier and Stevens (1999) modified the Cochrane Collaboration Protocol for Sys-tematic Reviews for a report that explored the evidence for decision-making and health visits (home health care provision) in Britain. Health visiting and decision-making are influenced
20
by factors such as social and environmental forces, which tend not to be captured in the evidence emerging from randomized control trials. The adapted Cochrane Protocol omitted sections that were felt to be inappropriate for non-RCT research papers. The methodologi-cal rigour and relevance of each article was determined with a numerical score ranging from 8 (maximum) to 1 (minimum). The authors did not provide the variables used to measure methodological rigour. Two members of the research team reviewed each article, and week-ly consensus meetings were held to reach agreement on scoring differences. The scores were intended to provide a rapid overview of the methodology and relevance of each article; however, the final aggregate score provided no indication of the allocation of marks. Review-ers commented on the appropriateness and significance of each article, but these comments were not part of the scoring. The authors found that the scoring system was a useful way to guide discussion at the consensus meetings. Scoring was impacted by the reviewers’ inter-pretation of qualitative methodologies and relevance of articles. They also suggested that this more open approach allowed articles that contained important information or sections to be reviewed in spite of low scores. Some papers contained very brief methods sections, which made the scoring difficult and resulted in low scores.
Acknowledging that synthesizing diverse data in reviews is a challenge, Goldsmith, Bank-head and Austoker (2007) developed a new review method. They implemented it on a re-view of evidence to inform guidelines for the content of patient information related to cervical screening. They followed standard procedures for conducting a systematic review, including:
a priori study design• comprehensive literature search• two people independently reviewing titles and abstracts• quality assessment• data extraction•
The researchers used checklists to perform quality assessment on both quantitative and qualitative studies. Established checklists were used for quantitative studies (Critical Ap-praisals Skills Program (CASP), Scottish Intercollegiate Guidelines Network (SIGN), New Zealand Guidelines Group (NZGG) and UK Government Chief Social Researcher’s Office (UKGCSRO)) (CASP, 2005; Khan et al., 2001; Lethaby, Wells, & Furness, 2001; Spencer, Ritchie, & Lewis, 2004) . The researchers developed a checklist for the qualitative studies that included purpose, population, response rate, outcome definition and assessment. All checklists included a comments section. The methodological quality of studies was rated as ++ (all or most criteria were fulfilled), + (some criteria were fulfilled) or – (few or no criteria were fulfilled). The quality appraisal provided insight into the strengths and weaknesses of each study. However, methodological flaws did not result in studies being excluded. Quality scores were incorporated in the data synthesis phase of the review. Goldsmith et al. (2005) incorporated the synthesis guidelines set out in the GRADING system (Atkins et al., 2005) discussed in the Critical Appraisal Tools section of this paper.
DeWalt et al. (2004) conducted a systematic review of studies that measured literacy plus one or more health outcomes. The eligible study designs were observational: prospective and retrospective cohort studies, case-control studies and cross-sectional studies. Their review followed a standard systematic review process, including:
21
a comprehensive literature search of several electronic databases;• well-defined research questions;• inclusion/exclusion criteria;• two authors reviewing full articles• quality assessment and reporting of findings. •
The researchers adapted the quality assessment tool from West et al. (2002). Each study was rated according to:
study population• comparability of subjects• validity and reliability of the literacy measure• maintenance of comparable groups• outcome measure• statistical analysis• controlling for confounding•
Based on these criterions, the studies received a ‘good,’ ‘fair’ or ‘poor’ rating. This tool has not been validated. Although the review authors were able to provide a statistical analysis of the literacy and health outcomes, they cautioned that the primary study data posed many challenges due to the various reading measures used and cut points for analysis. As well, lack of adequate statistical measures, inadequate controlling for confounders and lack of adjustment for multiple comparisons made comparisons between studies difficult.
Thomson et al. (2006) examined data reporting on socioeconomic determinants of health in the UK to determine if governmental investment in the area had improved health. The data from 10 evaluations were synthesised using systematic review methods, including:
a search strategy• a priori inclusion/exclusion criteria• two people independently reviewing articles• data extraction• data synthesis•
Eight of the impact evaluations used case studies where data were gathered from a few sites to represent the national program. Methodological issues with the evaluation reports in-cluded poor evaluation methods and data sources and low sample sizes. The authors used a narrative synthesis to report findings. They reported that this approach was a new way to build evidence with the message tailored to impact the development of health public policy. The synthesis was challenged by the methodological shortcomings of the included studies.
Two studies (Stein, Dalziel, Garside et al., 2005; Dalziel, Round, Stein et al., 2005) exam-ined the quality of evidence of effectiveness used in health technology assessments (HTA). Case series, commonly used in HTAs, constitute a weak form of evidence in the hierarchy of evidence. In ideal situations, case series data would not be used in systematic reviews; however, there are times when they might be the only available evidence. Finding no simi-
22
lar studies in their literature search, Stein et al. examined a sample of systematic reviews of case series that explored the association between the characteristics of case series and outcome. They did not compare case series results with RCT results. In their analysis, they found little evidence of association between methodological features, sample size or prospective approach and outcome. They provided a narrative report of their findings, and included a table of individual outcomes of the analysed variables. They cautioned that all the interventions studied were surgical, which might limit the generalizability of their find-ings. In a second study, Dalziel et al. compared results from RCTs and case series studies in surgical interventions. They examined reports that included data from both RCTs and case series. They compared these studies using the intervention arm of the RCTs as a comparator: meta-analysis of RCT was compared with weighted robust regressions using the intervention as the confounding factor and estimating the coefficient size. Their analysis revealed that the RCTs showed no outcome difference between treatment types, while the case series showed an increase in mortality of 1–2% between treatments. Limitations of this review include:
analysis was constrained by methodological flaws in the case series studies, such • as poor reporting of data;use of specific surgical interventions may limit the generalizability of findings;• findings were based on a small number of studies. •
In a report that examined evidence for, and methods of, evaluating non-randomized trials, Deeks et al. (2003) identified 194 tools used to assess the quality of these trials. Sixty tools covered at least five of six pre-specified domains for internal validity and were classified as ‘top tools.’ Fourteen tools were ranked as ‘best tools,’ covering three of four core items of particular importance for non-randomized studies (allocation, comparable groups, prognostic factors identified and use of case-mix adjustment). The authors identified six of 14 tools as being suitable for use in systematic reviews. The strength of those tools was the phrasing of items that channelled the reviewer’s responses in a systematic way to ensure the assess-ments were as objective as possible.
Katrak et al. (2004) produced a systematic review of 121 published critical assessment tools from 108 papers located by searching electronic databases and the Internet. Most of the tools (87%) were specific to research design, with many (45%) of those developed for ex-perimental studies (RCTs and CCTs). Of a total of 16 generic critical appraisal tools, six were developed for experimental and observational studies. Eleven tools were found to be useful for any qualitative and quantitative research design. The authors extracted 74 items from 19 critical appraisal tools for observational studies. These items focused on:
data analyses• consideration of confounders• sample size or power calculation• whether appropriate statistical analysis was undertaken•
They extracted 36 items from the seven critical appraisal tools for qualitative studies. Most of the items focused on assessing external validity, methods of data analyses and justifica-tion of the study. These tools did not contain items about sample selection, randomization,
23
blinding, intervention or bias. The generic tools, reportedly usable for either experimental or observational studies, contained items that focused on sampling selection and data analy-ses, such as appropriateness of statistical analyses and sample size and power calculation. This review found no gold standard appraisal tool for any type of study.
Section Summary
These systematic reviews successfully incorporated primary studies with data that did not have control groups, or they examined studies with critical appraisal tools for the quality assessment of studies without control groups. The reviews that examined the feasibility of combining the results of RCTs and non-RCTs suggest that well-designed studies can pro-duce similar results, regardless of the study type. Many of these reviews suggest that poor execution of the primary study design, rather than the study design itself, result in lower quality scoring. They site under-reporting of methodology, poor management of confounders and/or lack of consistent analytical terminology as problematic when trying to compare data between studies.
Methods for Synthesizing Quantitative Studies without Control Groups
This section includes five papers that described methods for synthesizing quantitative stud-ies without control groups.
In a 2005 paper, Greenhalgh et al. described their process of meta-narrative review, devel-oped as a methodology for the synthesis of evidence across disciplines. They used a six-step process that included planning, searching for literature, mapping, doing an appraisal, synthesizing and making recommendations. According to the authors, their approach of producing “storied” accounts of the key research traditions moved complex literature out of confusion and into “sensemaking.” Five key principles underpinned this meta-narrative technique:
pragmatism• pluralism• historicity• contestation• peer review•
This approach may be helpful in situations where the scope of the research is broad and the literature diverse, and where researchers have approached a common problem using differ-ent study designs.
Whittemore and Knafl (2005) highlighted the integrative review method as a good option for integrating evidence from studies that emerge from diverse methodologies. The authors outlined strategies to enhance the rigour of integrative reviews. They proposed five steps to help ensure the methodological rigour of this type of review:
1. Problem identification: Ensures the research question is clearly defined. 2. Literature search: Incorporates a comprehensive search strategy.
24
3. Data evaluation: Becomes somewhat complex due to the methodologically diverse primary studies included in the review. Evaluating quality cannot follow the stan-dard systematic review process, but may need to focus on examples of authentic-ity, methodological quality, informational value and representativeness of available primary studies.
4. Data analysis: Includes data reduction, display, comparison and conclusions. Data reduction is achieved through an overall classification system in which primary studies are divided into subgroups that provide a logical order for analysis. This subgroup classification can be based on types of evidence, chronology, setting, sample characteristics or by a predetermined conceptual classification. The next step in the data reduction process is to code and extract data into a manageable framework. Data display involves converting the extracted data by specific vari-ables and subgroups. These displays include matrices, graphs and charts that allow for comparison across studies. Data comparison identifies patterns, themes and relationships between and amongst the variables. The final stage in data analy-sis is drawing conclusions.
5. Presentation: Includes the reporting of findings in an integrative review that can be in the form of tables or diagrams, and should include implications for practice, policy and research.
Norris and Atkins (2005) examined 49 reviews released during a five-year period by the Evidence-based Practice Centers that included evidence from studies using designs other than RCTs. Those designs included cohort studies and time series, before-after case series studies and cross-sectional studies. The authors suggested that one benefit to incorporating cohort studies and case series studies in reviews is that these types of designs are better able to capture long-term outcomes than are RCTs. They observed that while most of the 49 reviews incorporated non-randomized studies, the evidence tended to be from the RCTs. Assessing the quality of non-randomized trials was a challenge. Of the 49 reviews, 25% did not assess quality, 16% used published checklists or scoring systems, 10% adapted pub-lished scoring instruments and 49% used checklists that had been developed by the review-ers. Few of the tools had been tested for validity. To ensure adherence to principles of best evidence and to reduce bias, Norris and Atkins suggested that reviews begin with a detailed protocol which would then be strictly followed from review inception to completion. As well, the search strategy may need to be more fluid and repeated to compensate for the lack of sensitive and specific search strategies used for RCTs. The authors made the following recommendations:
1. Reviewers should assess the availability of RCTs addressing their review question before determining final inclusion criteria.
2. When considering the inclusion of non-randomized trials, reviewers need to consid-er potential biases and whether those biases can be minimized in well-conducted non-randomized trials.
3. The review process must be transparent so that reasons for inclusion and exclusion of study designs are explicit.
4. Reviewers should consider quality assessment of individual studies.
25
5. Reviewers should be aware of the impact of their inclusion criteria on the findings, and include a discussion of the potential impacts of bias on their conclusions.
6. Methodological quality of the included studies needs to be part of the discussion and conclusions, regardless of study design.
Jackson and Waters (2005) looked specifically at the challenges faced in writing systematic reviews for the public health sector. Some of the inherent difficulties included multi-compo-nent interventions, multiple outcomes measured, diverse populations and mixed study de-signs. They agreed with others who say that in public health, the intervention and the popu-lation need to dictate the study design, not vice versa. The researchers acknowledged the controversy around the appropriateness of the systematic review process for public health intervention studies. They also offered suggestions for improving the process as a means of moderating that criticism. Searching for literature is complex and time consuming. Therefore, it is essential to allow sufficient time for an adequate search, as well as to search multiple electronic databases and sources of grey literature. Quality assessment of all included ar-ticles is necessary, and they recommended using the tool developed by the Effective Public Health Practice Project, Hamilton, Ontario (Thomas et al., 2004a) “Quality Assessment Tool for Quantitative Studies.” This strong instrument (Deeks et al., 2003) can be used on RCTs, quasi-experimental and uncontrolled studies. Attention to theoretical frameworks in both primary studies and systematic reviews can help explain the differences that occur between the planning and the outcomes of the interventions. Measuring the integrity of inventions can help determine if an intervention was ineffective because of poor planning, or because the delivery of the intervention was incomplete. Due to the diverse populations receiving public health interventions, researchers should expect heterogeneity. Researchers and reviewers will not find all these components readily incorporated in all primary studies, but aware-ness of these components may help to strengthen systematic reviews emerging from public health intervention research.
Atkins and DiGuiseppi’s 1998 paper examined research needs about preventative health services use. Evidence from RCTs was not always available in the area of health promotion and preventative health. The researchers cited the known problems associated with obser-vational studies as reason for caution; however, they acknowledged that these studies can add to the body of knowledge. The strengths of observational studies include:
establishing linkages in the causal pathway;• building understanding of the natural history of disease;• identifying risk factors;• measuring compliance with and adverse effects of treatments;• determining accuracy of diagnostic tests;• assessing efficacy of interventions. •
Risk of bias can be modified with representative, population-based samples and careful con-sideration of potential confounders. Researchers also need to recognise that observational studies tend to exaggerate beneficial effects of treatment. The authors recommend including a large number of well-designed studies with consistent and preferably large effects when using observational studies to build treatment recommendations.
26
Section Summary
Synthesizing and integrating quantitative data from studies without control groups is chal-lenging. These six papers reported on various methods to incorporate such data in a useful format. The authors agreed that there is benefit to incorporating evidence from diverse study designs, but quality assessment is paramount for providing trustworthy evidence. Primary studies that clearly identify potential biases and probable impacts on effect can be usefully incorporated into systematic reviews.
Combining Qualitative and Quantitative Studies
This section examines three articles that describe methods for synthesizing data from both qualitative and quantitative studies.
Determining the quality of non-experimental studies is challenging and controversial; how-ever, these types of studies often contain important information for public health. Assessing quality using standard tools is not particularly helpful because few of these studies would score in the high or moderate category. Harden et al. (2004) used a standard approach for methodological quality assessment in systematic reviews on 35 primary non-intervention studies. Only four met all seven criteria used to assess quality. They developed a data extraction tool to help deconstruct each study (both quantitative and qualitative). From this, they reconstructed the results and presented them in structured summaries and evidence ta-bles. The synthesis process was non-linear and involved two reviewers going back and forth between the papers, the data extraction and the evidence tables. Pooling results, as done in meta-analysis, was not appropriate. Instead, these researchers used aggregate methods according to identified themes to synthesize findings. To answer the main research question, the reviewers integrated the findings from the non-intervention studies with 36 intervention studies, comparing the results to identify similarities and differences (Oliver et al., 2005). Using these two types of results allowed for greater breadth of perspectives and deeper understanding of public health issues from the point of view of the people who receive the targeted interventions.
Meta-reviews, as described by Wong and Raabe (1996), incorporate both qualitative and quantitative data. The authors suggested using a meta-review to address some of the limi-tations associated with traditional qualitative reviews, such as subjectivity in article selec-tion and weights assigned to individual studies, and lack of quantitative analysis. This was achieved by including the meta-analysis of quantitative data within a narrative review. The first factor for determining whether a study should be included is the study design; studies with different designs should not be grouped together for meta-analysis. Meta-analysis can increase statistical power, but it cannot compensate for other study design methodological issues. There are some basics steps in conducting a meta-review:
1. Define the research question.2. Conduct a comprehensive literature review.3. Create a priori inclusion criteria.4. Conduct a traditional qualitative review.
27
5. Conduct a quantitative meta-analysis.6. Integrate a traditional qualitative review with quantitative meta-analysis.7. Apply criteria for causation in interpretation.
The authors suggest that incorporating a meta-analysis with a traditional narrative review can reduce subjectivity.
A 2007 paper by Sandelowski, Barroso, & Voils, discussed the need for health disciplines to include both qualitative and quantitative research findings in reviews, especially in light of the growth of mixed methods research design. The authors examined 42 reports, including jour-nal articles, unpublished dissertations/theses and technical reports. To differentiate their ap-proach from a standard systematic review, no report was excluded based on methodological quality. To synthesize their findings, they developed a qualitative meta-synthesis model that grouped primary research findings in topical and thematic units. The meta-summary included the extracting, grouping and formatting of findings, as well as the calculating of frequency and effect sizes for the quantitative data. This approach can be used to synthesize mixed meth-ods surveys in which the data collection and analysis procedures are similar.
Section Summary
Incorporating qualitative and quantitative data in one report can be helpful for building an understanding of the success or failure of interventions. However, assessing the quality of studies without control groups can be problematic, especially if quality is assessed with traditional methods used in systematic reviews or meta-analysis. Results of quality assess-ment were not used as exclusion criteria, but were instead incorporated in the discussion of findings. Grouping findings according to a thematic and aggregate method is an appropri-ate alternative approach to meta-analysis. Researchers could incorporate some systematic review processes, such as having two people review and perform data extraction, but most authors suggest using a narrative approach for presenting results when meta-analysis is not appropriate.
Studies Comparing Results Using Differing Methods
The search included two studies that examined issues that arise when comparing results of randomized and non-randomized studies of the same intervention.
Jefferson and Demicheli (1999) assessed the capability of different research designs for testing four aspects of vaccine performance (immunogenicity, duration of immunity con-ferred, incidence and seriousness of side effects, and number of infections prevented by vaccination). Their findings indicated that in vaccinology, experimental and non-experimental study designs are frequently complementary. However, in some situations, vaccine quality could only be measured with one type of study. For instance, an RCT is the preferred study design to measure side effects. Non-experimental designs are appropriately applied when:
an experiment is impossible, unnecessary or inappropriate;• the individual efficacy is to be measured in terms of infrequent adverse events;• when interventions prevent rare events or the population effectiveness of an inter-•
28
vention to be measured in the long term.
Using interview data collected from 17 authors or users of Health Technology Assessments (HTA), Rotstein and Laupacis (2004) shed light on the differences between HTAs and sys-tematic reviews. These differences include:
1. Methodological standards—HTAs may include literature of poor methodological quality if a topic is important to decision-makers.
2. Replication of previous studies—systematic reviews do not need to be repeated if previous studies were of high-quality or when there is no new high-quality evi-dence; in HTAs there is often a need to repeat studies to defend the report’s con-clusions.
3. Choice of topics—topics are more policy-oriented with HTAs, while systematic re-views tend to be driven by effectiveness questions.
4. Inclusion of content experts (in systematic reviews) and policy-makers (in HTAs) as authors.
5. Inclusion of economic evaluations—in HTAs.6. Making policy recommendations—in HTAs.7. Dissemination of report—more often actively done for HTAs.
HTAs are not specifically designed for evaluating scientific evidence, while systematic re-views are designed for this purpose. Yet the impartial nature of scientific investigation would help strengthen the policy decisions emerging from HTA evidence. Different levels of evi-dence (below RCTs in the hierarchy) are more acceptable in HTAs.
Section Summary
These two comparisons of results from different designs about the same intervention point out that it is useful to consider different designs in reviews, not only because they may be the best available information, but also because they can more likely answer questions re-lated to long-term effectiveness, adverse events and rare events. The strengths of including other designs outweigh the limitations.
29
Discussion
The main research focus for this paper was the methods of including results from non-randomized trials, as well as the quality and synthesis of that data. This is an important research question because in many areas of health and public health, evidence from ran-domized control trials is not available. Historically, evidence emerging from non-randomized trials has been criticized as unreliable due to a greater potential of bias. It is commonly understood that there is a greater tendency for selection, allocation or attrition bias in non-randomized trials. Readers of data from non-randomized trials need to be aware that these biases may exist, and researchers should account for these potential biases in their studies. Some suggestions emerging from the studies examined in this paper include:
There is a need for a strong and transparent study design in all non-randomized • trials.The research question should determine the study design.• Study authors need to incorporate detailed information about the study; lack of • information means that readers cannot determine what has or has not been done in the study design.There is a need for consistency in terminology and descriptions across non-ran-• domized studies to facilitate comparisons.
There are several methodological approaches for including data from non-randomized stud-ies. We found the work of Whittemore and Knafl (2005) to be most systematic. Whittemore and Knafl provide a five-step integrative review method that incorporates problem identifica-tion, literature search, data evaluation, data analysis and presentation of data.
Many researchers use various assessment tools to determine the quality of the included studies. Norris and Atkins (2005) provide a comprehensive list of recommendations for assessing study quality that emerged from the 49 reviews they examined. These recom-mendations were outlined earlier in this paper. Other authors used existing checklists or scoring instruments, such as MINORS or GRADING. Some adapted tools to conform to the study designs found in the primary studies, while others developed new tools that reflected the information to be extracted from included studies. Many of these tools have not been tested for validity or reliability. We recommend that further research be done to determine the validity and reliability of these quality assessment tools for non-randomized trials. Where possible, reviewers should use tools that have been tested and determined to be valid and reliable for non-randomized trials, such as the Quality Assessment Instrument for Primary Studies developed by the Effective Public Health Practice Project, Hamilton, Ontario.
Including data from non-randomized trials is challenging; however, several benefits can be achieved from its inclusion. Drawing from non-randomized studies means that researchers can add a wider body of literature in their search strategies (Thomas et al., 2004). Incorpo-rating these studies may broaden the scope of the questions that can be answered. This research can shed light on why, how and for whom interventions or programs succeed or fail (Oliver et al., 2005).
30
Recommendations
There is currently no standardized model for synthesizing the results of studies that do not have control groups. This paper examined synthesis methods, critical appraisal tools and studies that deal with appraising and synthesizing the results from quantitative studies that lack control groups.
This paper outlined some precautions to take when using non-controlled studies as evi-dence, and made recommendations for processes and tools:
1. Incorporate studies without control groups into systematic reviews, especially when there are no other studies to consider. Studies without control groups can also pro-vide information on long-term effectiveness, rare events and adverse effects.
2. Do not generalize the direction of differences in outcome between randomized and non-controlled studies. While effect sizes are more often smaller in randomized controlled trials (RCTs), some show the reverse or the same results across study designs.
3. Use Whittemore and Knafl’s (2005) approach to include results of non-randomized trials. Whittemore and Knafl provide a five-step integrative review method that incorporates problem identification, literature search, data evaluation, data analysis and presentation of data.
4. Recommendations for methods of systematic review including studies without con-trol groups:
a) Use a critical appraisal tool that has been tested for internal consistency, test-retest and inter-rater reliability, and at least face and criterion validity (Downs & Black, 1998; Slim, Nini, Forestier et al., 2003; Steuten, Vrijhoef, van-Merode et al., 2004; Thomas, Ciliska, Dobbins et al., 2004a; Zaza, Wright-De Aguero, Briss et al., 2000).
b) Do not meta-analyse results from observational studies.5. Suggestions to authors of primary studies:
a) Ensure a strong and transparent study design in all non-randomized trials.b) Confirm that the research question determines the study design.c) Incorporate detailed information about the study; lack of information means
that readers cannot determine what has or has not been done in the study design.
d) Ensure consistency in terminology and descriptions across non-random-ized studies to facilitate comparisons.
6. Implications for further research:a) Conduct reliability and validity testing of any critical appraisal tools. b) Conduct further studies to assess the different results obtained by random-
ized versus non-randomized trials. 7. Including data from non-randomized trials is challenging; however, several benefits
can be achieved:
31
a) A wider body of literature can be included in search strategies (Thomas, Harden, Oakley et al., 2004).
b) The scope of the questions that can be answered may be broadened. c) This research can shed light on why, how and for whom interventions or
programs succeed or fail (Oliver, Harden, Rees et al., 2005).
32
Stud
yPu
rpos
eM
etho
ds/R
esul
tsC
omm
ents
/Issu
esR
ange
l, et
al.
2003
US
A
CR
ITIC
AL
A
PPR
AIS
AL
TO
OLS
To d
escr
ibe
the
deve
lopm
ent a
nd
pote
ntia
l app
licat
ions
of
a s
tand
ardi
zed
qual
ity a
sses
smen
t sc
ale
desi
gned
for
retro
spec
tive
stud
ies
in p
edia
tric
surg
ery.
Dev
elop
ed a
com
preh
ensi
ve q
ualit
y as
sess
men
t ins
trum
ent i
ncor
pora
ting
30 it
ems
with
in th
ree
subs
cale
s.
The
subs
cale
s w
ere
desi
gned
to a
sses
s cl
inic
al re
leva
nce,
repo
rting
met
hodo
logy
and
the
stre
ngth
of c
oncl
usio
ns.
Glo
bal q
ualit
y ra
tings
(poo
r, fa
ir or
goo
d) w
ere
deriv
ed b
y co
mbi
ning
sco
res
from
eac
h su
b-sc
ale.
Six
inde
pend
ent r
evie
wer
s ex
amin
ed in
ter-
rate
r rel
iabi
lity
by a
sses
sing
the
inst
rum
ent i
n 10
re
trosp
ectiv
e st
udie
s fro
m p
edia
tric
surg
ery
liter
atur
e.
Find
ings
:
The
inst
rum
ent h
ad e
xcel
lent
inte
r-ra
ter r
elia
bilit
y.•
Ther
e w
ere
high
leve
ls o
f agr
eem
ent f
or a
ll ite
ms
with
in in
stru
men
t (84
.6%
con
cord
ance
, n
•=
1573
item
s) a
nd fo
r all
indi
vidu
al s
ubsc
ales
(ran
ge, 7
3.3%
to 8
5.8%
, n =
60
to 1
258)
Con
clus
ions
:
The
auth
ors
have
dev
elop
ed a
sta
ndar
dize
d an
d •
relia
ble
qual
ity a
sses
smen
t sca
le fo
r the
ana
lysi
s of
retro
spec
tive
data
in p
edia
tric
surg
ery.
Res
ults
may
not
be
gene
raliz
able
to o
ther
•
inte
rven
tions
.
Pot
entia
l app
licat
ions
:
Pro
vide
pra
ctic
ing
surg
eons
with
a k
now
ledg
e •
base
to c
ritic
ally
eva
luat
e pu
blis
hed
retro
spec
-tiv
e da
ta;
Pro
vide
sta
ndar
dize
d m
etho
dolo
gy fo
r SR
of
•ex
istin
g re
tro d
ata;
Dev
elop
sta
ndar
dize
d re
porti
ng g
uide
lines
for
•us
e in
pee
r-re
view
ed jo
urna
ls.
Mar
getts
, et a
l.
1995
UK
CR
ITIC
AL
A
PPR
AIS
AL
TO
OLS
To d
evel
op a
sco
ring
syst
em to
hel
p ju
dge
thescientificqual
-ity
of o
bser
vatio
nal
epid
emio
logi
c st
udie
s lin
king
die
t with
risk
of
canc
er.
The
scor
ing
syst
em w
as d
evel
oped
from
key
hea
ding
s us
ed in
dev
elop
ing
rese
arch
pro
toco
ls
and
incl
uded
que
stio
ns u
nder
hea
ding
s:
thre
e fo
r cas
e co
ntro
l stu
dies
(die
tary
ass
essm
ent,
recr
uitm
ent o
f sub
ject
s an
d an
alys
is) –
•
scor
ing
syst
em in
App
endi
x A
;
fourforcohortstudies(dietaryassessm
ent,definitionofcohort,ascertainmentandanaly
-•
sis)
– s
corin
g sy
stem
in A
ppen
dix
B.
Inte
r-ob
serv
er v
aria
tion
was
ass
esse
d:
Ther
e w
as g
ood
agre
emen
t bet
wee
n ob
serv
ers
in th
e ra
nkin
g of
stu
dies
.
The
scor
ing
syst
em w
as a
pplie
d to
a re
view
of m
eat a
nd c
ance
r ris
k:
From
whatseemedlikealargeliteraturesample,relativelyfewstudiesscoredwell(definedas
a sc
ore
> 65
%).
How
ever
, the
se s
tudi
es te
nded
to p
rovi
de m
ore
cons
iste
nt in
form
atio
n.
For c
ase
cont
rol s
tudi
es, 3
4 ou
t of 1
06 s
tudi
es s
core
d >
65%
.
For c
ohor
t stu
dies
, 10
out o
f 41
stud
ies
scor
ed >
65%
.
The
prot
otyp
e sc
orin
g sy
stem
repo
rted
here
hel
ped
the
auth
ors
desc
ribe
the
epid
emio
logi
c da
ta o
n di
et
and
canc
er, b
ut it
is n
ot g
ener
aliz
able
to o
ther
set
-tin
gs o
r int
erve
ntio
ns.
Table summarizing results of relevant papers
33
Stud
yPu
rpos
eM
etho
ds/R
esul
tsC
omm
ents
/Issu
esD
owns
& B
lack
1998
UK
CR
ITIC
AL
A
PPR
AIS
AL
TO
OLS
To te
st th
e fe
asib
ility
of
cre
atin
g a
valid
and
re
liabl
e ch
eckl
ist w
ith
the
follo
win
g fe
atur
es:
App
ropr
iate
for a
s-se
ssin
g bo
th ra
ndom
-iz
ed a
nd n
on-r
ando
m-
ized
stu
dies
.
Pro
vide
s bo
th a
n ov
eral
l sco
re fo
r stu
dy
qualityandaprofile
of s
core
s, n
ot o
nly
for
the
qual
ity o
f rep
ortin
g,
inte
rnal
val
idity
and
po
wer
, but
als
o fo
r ex
tern
al v
alid
ity.
The
stud
y be
gan
with
a p
ilot v
ersi
on o
f the
tool
. The
rese
arch
ers
aske
d th
ree
expe
rts to
eva
lu-
ate
face
and
con
tent
val
idity
. Afte
rwar
d, tw
o ra
ters
use
d th
e to
ol to
ass
ess
10 ra
ndom
ized
and
10
non
-ran
dom
ized
stu
dies
to m
easu
re re
liabi
lity.
Find
ings
:
Usi
ng d
iffer
ent r
ater
s, th
e ch
eckl
ist w
as re
vise
d an
d te
sted
for i
nter
nal c
onsi
sten
cy (K
uder
-•
Ric
hard
son
20),
test
-ret
est a
nd in
ter-
rate
r rel
iabi
lity,
crit
erio
n va
lidity
and
resp
onde
nt
burd
en.
The
rese
arch
ers
foun
d hi
gh in
tern
al c
onsi
sten
cy in
Qua
lity
Inde
x.
•
Test
-ret
est a
nd in
ter-
rate
r rel
iabi
lity
of th
e Q
ualit
y In
dex
wer
e go
od.
•
Con
clus
ions
:
It is
pos
sibl
e to
pro
duce
a c
heck
list t
hat e
valu
-•
ates
rand
omiz
ed a
nd n
on-r
ando
miz
ed s
tudi
es.
The
mai
n co
ncer
n is
with
the
chec
klis
t’s e
xter
nal
•va
lidity
and
app
licat
ion
of it
to to
pics
bey
ond
the
stud
y’s
purp
ose.
Slim
, et a
l.
2003
Fran
ce
CR
ITIC
AL
A
PPR
AIS
AL
TO
OLS
To d
evel
op a
nd
valid
ate
a M
etho
d-ol
ogic
al In
dex
for N
on-
Ran
dom
ized
Stu
dies
(M
INO
RS
) whi
ch c
ould
be
use
d by
read
ers,
m
anus
crip
t rev
iew
ers
or jo
urna
l edi
tors
to
asse
ss th
e qu
ality
of
such
stu
dies
.
The
inde
x co
nsis
ted
of 1
2 ite
ms.
In th
e ca
se o
f non
-com
para
tive
stud
ies,
item
s in
clud
ed:
a st
ated
aim
of t
he s
tudy
;•
incl
usio
n of
con
secu
tive
patie
nts;
•
pros
pect
ive
colle
ctio
n of
dat
a;•
endp
oint
app
ropr
iate
to th
e st
udy
aim
;•
unbi
ased
eva
luat
ion
of e
ndpo
ints
;•
follo
w-u
p pe
riod
appr
opria
te to
the
maj
or e
ndpo
int;
•
loss
to fo
llow
-up
not e
xcee
ding
5%
.•
In th
e ca
se o
f com
para
tive
stud
ies,
item
s in
clud
ed:
a co
ntro
l gro
up h
avin
g th
e go
ld s
tand
ard
inte
rven
tion;
•
cont
empo
rary
gro
ups;
•
base
line
equi
vale
nce
of g
roup
s;•
pros
pect
ive
calc
ulat
ion
of th
e sa
mpl
e si
ze;
•
stat
istic
al a
naly
ses
adap
ted
to th
e st
udy
desi
gn.
•
Theindexhadgoodinter-review
eragreement,hightest-retestreliabilitybythekappa-coeffi-
cientandgoodinternalconsistencybyCronbach’salphacoefficient.
MIN
OR
S is
a v
alid
inst
rum
ent d
esig
ned
to a
sses
s th
e m
etho
dolo
gica
l qua
lity
of n
on-r
ando
miz
ed s
urgi
cal
stud
ies,
whe
ther
com
para
tive
or n
on-c
ompa
rativ
e.
Res
earc
hers
may
nee
d to
use
cau
tion
if ap
plyi
ng th
e in
stru
men
t to
non-
surg
ical
inte
rven
tions
.
34
Stud
yPu
rpos
eM
etho
ds/R
esul
tsC
omm
ents
/Issu
esZa
za e
t al.
2000
US
CR
ITIC
AL
A
PPR
AIS
AL
TO
OLS
1. T
o as
sist
with
co
nsis
tenc
y, re
duce
bi
as a
nd im
prov
e va
lidity
for r
esea
rch-
ers
usin
g th
e G
uide
to
Com
mun
ity
Pre
vent
ion
Ser
vice
s:
Sys
tem
atic
Rev
iew
s an
d E
vide
nce-
Bas
ed
Rec
omm
enda
tions
(th
e G
uide
)
2. T
o de
velo
p a
stan
dard
ized
dat
a ex
tract
ion
form
to u
se
with
the
Gui
de.
The
form
was
dev
elop
ed b
y:
revi
ewin
g m
etho
dolo
gies
from
oth
er s
yste
mat
ic re
view
s;•
repo
rting
sta
ndar
ds e
stab
lishe
d by
hea
lth a
nd s
ocia
l sci
ence
jour
nals
;•
exam
inin
g ev
alua
tion,
sta
tistic
al a
nd m
eta-
anal
ysis
lite
ratu
re;
•
solic
iting
exp
ert o
pini
ons.
•
The
form
was
use
d to
ass
ess
the
met
hodo
logi
cal q
ualit
y of
prim
ary
stud
ies
(23
ques
tions
) and
to
cla
ssify
and
des
crib
e ke
y ch
arac
teris
tics
of th
e in
terv
entio
n an
d ev
alua
tion
(26
ques
tions
).
The
rese
arch
ers
exam
ined
stu
dy p
roce
dure
s an
d re
sults
and
ass
esse
d th
reat
s to
val
idity
ac
ross
six
cat
egor
ies:
inte
rven
tion
and
stud
y de
scrip
tions
•
sam
plin
g•
mea
sure
men
t•
anal
ysis
•
inte
rpre
tatio
n of
resu
lts•
othe
r exe
cutio
n is
sues
•
This
pro
cess
impr
oved
the
valid
ity a
nd re
liabi
lity
of
the
Gui
de.
The
stan
dard
ized
dat
a ex
tract
ion
form
can
ass
ist
rese
arch
ers
and
read
ers
of p
rimar
y st
udie
s to
revi
ew
the
cont
ent a
nd q
ualit
y of
the
stud
ies.
The
form
can
al
so a
ssis
t in
the
deve
lopm
ent o
f man
uscr
ipts
for
subm
issi
on to
pee
r rev
iew
ed jo
urna
ls.
Gre
er, e
t al.
2000
US
CR
ITIC
AL
A
PPR
AIS
AL
TO
OLS
(evi
denc
e gr
adin
g sy
stem
s)
To d
escr
ibe
in d
etai
l th
e ev
iden
ce a
nd
conc
lusi
on g
radi
ng
syst
em d
evel
oped
by
the
Inst
itute
for C
linic
al
Sys
tem
s Im
prov
emen
t (IC
SI)
for u
se b
y th
e pr
actic
ing
clin
icia
ns
who
writ
e th
e do
cu-
men
ts a
nd u
se th
em
in m
akin
g de
cisi
ons
abou
t pat
ient
car
e
The
grad
ing
syst
em in
clud
ed a
n ev
alua
tion
of in
divi
dual
rese
arch
repo
rts a
nd a
n as
sess
men
t of
the
over
all s
treng
th o
f the
evi
denc
e su
ppor
ting
a pa
rticu
lar c
oncl
usio
n or
reco
mm
enda
tion.
The
ratin
g sy
stem
to a
sses
s th
e qu
ality
of i
ndiv
idua
l res
earc
h re
ports
can
be
quic
kly
unde
rsto
od
and
mor
e ea
sily
use
d by
pra
ctic
ing
phys
icia
ns.
The
inte
nt o
f the
gra
ding
sys
tem
is to
hig
hlig
ht a
repo
rt th
at is
unu
sual
ly s
trong
or m
arke
dly
flawed.
The
evid
ence
gra
ding
sys
tem
:
incl
udes
the
conc
lusi
on g
radi
ng w
orks
heet
, whi
ch c
alls
for s
tate
men
t of a
con
clus
ion,
a
•su
mm
ary
of re
sear
ch re
ports
that
sup
port
or d
ispu
te th
e co
nclu
sion
, ass
ignm
ent o
f cla
sses
an
d qu
ality
mar
kers
to th
e re
sear
ch re
ports
, and
ass
ignm
ent o
f a g
rade
to th
e co
nclu
sion
;
has
been
use
d in
the
writ
ing
of m
ore
than
40
guid
elin
es a
nd n
umer
ous
tech
nolo
gy a
sses
s-•
men
t rep
orts
.
Con
clus
ions
:
The
syst
em a
ppea
rs to
be
succ
essf
ul in
•
over
com
ing
the
com
plex
ity o
f som
e pu
blis
hed
syst
ems
of g
radi
ng e
vide
nce,
whi
le s
till y
ield
ing
a defensibleclassificationofconclusionsbasedon
the
stre
ngth
of t
he u
nder
lyin
g ev
iden
ce.
The
syst
em’s
relia
bilit
y ne
eds
to b
e rig
orou
sly
•te
sted
.
35
Stud
yPu
rpos
eM
etho
ds/R
esul
tsC
omm
ents
/Issu
esS
teut
en, e
t al.
2004
Net
herla
nds
CR
ITIC
AL
A
PPR
AIS
AL
TO
OLS
To d
escr
ibe
to w
hat
exte
nt e
xist
ing
inst
rum
ents
are
use
-fu
l in
asse
ssin
g th
e m
etho
dolo
gica
l qua
lity
of H
ealth
Tec
hnol
ogy
Ass
essm
ent (
HTA
) of
dise
ase
man
agem
ent.
Problem
swereidentifiedintheassessmentofthemethodologicalqualityofsixHTA
sofdis
-ea
se m
anag
emen
t with
thre
e di
ffere
nt in
stru
men
ts. T
he p
robl
ems
mai
nly
conc
erne
d:
stud
y de
sign
(RC
T ve
rsus
obs
erva
tiona
l stu
dies
); •
crite
ria fo
r sel
ectio
n an
d re
stric
tion
of p
atie
nts;
•
base
line
and
outc
ome
mea
sure
s (s
ever
al p
aram
eter
s ar
e ne
cess
ary)
;•
blin
ding
of p
atie
nts
and
prov
ider
s (w
hich
is im
poss
ible
in d
isea
se m
anag
emen
t);•
desc
riptio
n of
com
plex
, mul
tifac
eted
inte
rven
tions
and
avo
idan
ce o
f co-
inte
rven
tions
.•
The
rese
arch
ers
prop
osed
and
val
idat
ed a
new
inst
rum
ent,
HTA
-DM
, tha
t:
incl
udes
four
com
pone
nts
(stu
dy p
opul
atio
n, d
escr
iptio
n of
inte
rven
tion,
mea
sure
men
t of
•ou
tcom
es a
nd d
ata-
anal
ysis
/pre
sent
atio
n of
dat
a).
cont
ains
15
item
s (o
n in
tern
al v
alid
ity, e
xter
nal v
alid
ity, s
tatis
tical
con
side
ratio
ns).
•
Cho
u &
Hel
fand
2005
US
A
MET
HO
DS
(Sys
tem
atic
R
evie
ws
(SR
) of
har
ms)
CR
ITIC
AL
A
PPR
AIS
AL
TO
OLS
To re
view
the
met
hodo
logi
cal c
hal-
leng
es in
per
form
ing
SR
s of
har
ms
and
high
light
exa
mpl
es o
f ap
proa
ches
to th
em
from
96
EP
C e
vide
nce
repo
rts.
Som
e st
udie
s ha
ve fo
und
that
obs
erva
tiona
l stu
dies
and
RC
Ts re
port
sim
ilar e
stim
ates
of e
f-fe
cts.
Oth
ers
have
foun
d th
at c
linic
al tr
ials
repo
rt hi
gher
risk
s fo
r adv
erse
eve
nts
than
obs
erva
tiona
l st
udie
s (p
ossi
bly
due
to p
oore
r ass
essm
ent o
f har
m in
obs
erva
tiona
l stu
dies
, or t
hey
may
be
mor
e lik
ely
to b
e pu
blis
hed
if th
ey re
port
good
met
hods
/ res
ults
).
Incl
udin
g la
rge
data
base
s m
ay p
rovi
de u
sefu
l inf
orm
atio
n ab
out h
arm
.
Dat
a fro
m p
ract
ice-
base
d ne
twor
ks a
re o
ften
riche
r in
clin
ical
det
ail t
han
adm
inis
trativ
e da
taba
s-es—itmaybepossibletoidentifyandmeasurelikelyconfounderswithmoreconfidence,but
theyarehardertofindusingelectronicsearchesbecausemanysuchanalysesareproprietary.
Incl
udin
g ca
se re
ports
can
hel
p id
entif
y un
com
mon
, une
xpec
ted
or lo
ng-te
rm a
dver
se d
rug
even
ts th
at a
re o
ften
diffe
rent
from
thos
e de
tect
ed in
clin
ical
tria
ls.
Ass
essi
ng th
e qu
ality
of h
arm
repo
rting
:
Sev
eral
SR
s fo
und
that
pro
spec
tive
or re
trosp
ectiv
e de
sign
, cas
e-co
ntro
l com
pare
d w
ith
•co
hort
stud
ies,
and
sm
alle
r com
pare
d w
ith la
rger
cas
e se
ries
had
no c
lear
effe
ct o
n es
ti-m
ates
of h
arm
.
Bet
ter d
ata
abou
t har
m is
nee
ded
to c
ondu
ct b
al-
ance
d S
Rs.
Furth
er re
sear
ch is
nee
ded
to e
mpi
rical
ly d
eter
min
e th
e im
pact
of i
nclu
ding
dat
a fro
m d
iffer
ent s
ourc
es o
n as
sess
men
ts o
f har
m, a
nd fu
rther
dev
elop
and
test
cr
iteria
for r
atin
g th
e qu
ality
of h
arm
repo
rting
.
It is
impo
rtant
to:
avoi
d in
appr
opria
te s
tatis
tical
com
bina
tions
of
•da
ta;
care
fully
des
crib
e th
e ch
arac
teris
tics
and
qual
ity
•of
incl
uded
stu
dies
;
thor
ough
ly e
xplo
re p
oten
tial s
ourc
es o
f het
ero-
•ge
neity
.
36
Stud
yPu
rpos
eM
etho
ds/R
esul
tsC
omm
ents
/Issu
esM
cInt
osh,
et a
l.
2004
UK
MET
HO
DS
(SR
s)
To p
rese
nt th
e ex
peri-
ence
of c
ondu
ctin
g sy
stem
atic
revi
ews
(SR
s) o
f har
mfu
l ef
fect
s an
d to
mak
e su
gges
tions
for f
utur
e pr
actic
e an
d fu
rther
re
sear
ch.
The
auth
ors
eval
uate
d th
e m
etho
ds u
sed
in th
ree
SR
s, fo
cusi
ng o
n th
e re
view
que
stio
n, s
tudy
de
sign
s an
d qu
ality
ass
essm
ent.
Rev
iew
que
stio
n:
Onereview
hadaspecificquestion,focusedonprovidinginformationonspecificharmful
•ef
fect
s to
furn
ish
an e
cono
mic
mod
el; t
he o
ther
two
revi
ews
had
broa
der q
uest
ions
.
Stu
dy d
esig
ns:
Allthreeincludedrandom
izedandobservationaldata,buteachdefinedtheinclusion
•cr
iteria
diff
eren
tly.
Qua
lity
asse
ssm
ent:
App
lied
publ
ishe
d ch
eckl
ists
—en
coun
tere
d pr
oble
ms;
•
Inad
equa
te re
porti
ng o
f bas
ic d
esig
n fe
atur
es o
f the
prim
ary
stud
ies—
chec
klis
ts o
mit
key
•fe
atur
es, s
uch
as h
ow h
arm
ful e
ffect
s da
ta w
ere
reco
rded
.
Key
are
as fo
r im
prov
emen
t inc
lude
:
focu
sing
the
revi
ew q
uest
ion,
as
broa
d an
d no
n-•specificquestionsarenothelpfultotheprocess
of re
view
;
deve
lopi
ng s
tand
ardi
zed
met
hods
for t
he q
ualit
y •
asse
ssm
ent o
f stu
dies
of h
arm
ful e
ffect
s.
Atk
ins,
et a
l.
2005
US
A
CR
ITIC
AL
A
PPR
AIS
AL
TO
OLS
(gra
ding
sys
tem
)
To p
ilot t
est a
nd fu
rther
de
velo
p th
e G
RA
DE
ap
proa
ch to
gra
ding
ev
iden
ce a
nd re
com
-m
enda
tions
.
The
GR
AD
E a
ppro
ach
was
dev
elop
ed b
y th
e G
RA
DE
Wor
king
Gro
up to
gra
de th
e qu
ality
of
evid
ence
and
the
stre
ngth
of r
ecom
men
datio
ns.
Researchersused12evidenceprofilesinthispilotstudy.
Eachprofilewasmadebasedoninformationavailableinasystematicreview
.
17 p
eopl
e in
depe
nden
tly g
rade
d th
e le
vel o
f evi
denc
e an
d th
e st
reng
th o
f rec
omm
enda
tions
for
eachofthe12evidenceprofiles.
Qua
lity
of e
vide
nce
for e
ach
outc
ome:
Res
earc
hers
foun
d th
at in
add
ition
to s
tudy
des
ign,
qua
lity,
con
sist
ency
and
dire
ctne
ss,
•otherqualitycriteriaalsoinfluencedjudgmentsaboutevidence;sparsedata,strongas-
soci
atio
ns, p
ublic
atio
n bi
as, d
ose
resp
onse
and
situ
atio
ns w
here
all
plau
sibl
e co
nfou
nder
s strengthenedratherthanweakenedconfidenceinthedirectionoftheeffect.
Rel
iabi
lity:
Ther
e w
as a
var
ied
amou
nt o
f agr
eem
ent o
n th
e qu
ality
of e
vide
nce
for t
he o
utco
mes
•relatingtoeachofthetwelvequestions(kappacoefficientsforagreementbeyondchance
rang
ed fr
om 0
to 0
.82)
.
Ther
e w
as fa
ir ag
reem
ent a
bout
the
rela
tive
impo
rtant
of e
ach
outc
ome.
•
Therewaspooragreementaboutthebalanceofbenefitsandharmsandrecommendations
•(c
ould
, in
part,
be
expl
aine
d by
the
accu
mul
atio
n of
all
the
prev
ious
diff
eren
ces
in g
radi
ng
of th
e qu
ality
and
impo
rtanc
e of
the
evid
ence
).
Mos
t of t
he d
iscr
epan
cies
wer
e ea
sily
reso
lved
thro
ugh
disc
ussi
on.
•
37
Stud
yPu
rpos
eM
etho
ds/R
esul
tsC
omm
ents
/Issu
esK
han,
et a
l.
2001
UK
CR
ITIC
AL
A
PPR
AIS
AL
TO
OLS
To p
rovi
de b
ackg
roun
d in
form
atio
n on
stu
dy
qual
ity a
nd d
escr
ibe
how
to d
evel
op
qual
ity a
sses
smen
t ch
eckl
ists
.
The
rese
arch
ers
desc
ribed
reas
ons
to in
clud
e st
udy
qual
ity a
sses
smen
t in
revi
ews,
type
s of
bi
ases
, som
e m
etho
ds to
pro
tect
aga
inst
bia
s in
prim
ary
effe
ctiv
enes
s st
udie
s an
d ho
w q
ualit
y as
sess
men
t ins
trum
ents
sho
uld
be d
evel
oped
.
Qua
lity
crite
ria fo
r ass
essm
ent o
f obs
erva
tiona
l stu
dies
incl
udes
ans
wer
ing
a se
ries
of q
ues-
tions
for c
ohor
t stu
dies
, cas
e-co
ntro
l stu
dies
and
cas
e se
ries.
Coh
ort s
tudi
es:
Istheresufficientdescriptionofthegroupsandthedistributionofprognosticfactors?
•
Arethegroupsassem
bledatasimilarpointintheirdiseaseprogression?
•
Istheintervention/treatmentreliablyascertained?
•
Werethegroupscom
parableonallimportantconfoundingfactors?
•
Wasthereadequateadjustmentfortheeffectsoftheseconfoundingvariables?
•
Wasadose-responserelationshipbetweeninterventionandoutcom
edemonstrated?
•
Wasoutcomeassessmentblindtoexposurestatus?
•
Wasfollow-uplongenoughfortheoutcomestooccur?
•
Whatproportionofthecohortw
asfollowed-up?
•
Wer
e dr
op-o
ut ra
tes
and
reas
ons
for d
rop-
out s
imila
r acr
oss
inte
rven
tion
and
unex
pose
d •groups?
Cas
e-co
ntro
l stu
dies
:
Isthecasedefinitionexplicit?
•
Hasthediseasestateofthecasesbeenreliablyassessedandvalidated?
•
Werethecontrolsrandom
lyselectedfromthesourceofpopulationofthecases?
•
How
com
parablearethecasesandcontrolswithrespecttopotentialconfoundingfactors?
•
Wereinterventionsandotherexposuresassessedinthesamewayforcasesandcontrols?
•
Con
clus
ions
:
In a
revi
ew th
at in
corp
orat
es th
ese
stud
y de
-•signs,itcanbedifficulttodetermineifdifferences
are
a re
sult
of th
e in
terv
entio
n or
gro
up in
com
pat-
ibili
ties.
Res
ults
sho
uld
be v
iew
ed w
ith c
autio
n.•
38
Stud
yPu
rpos
eM
etho
ds/R
esul
tsC
omm
ents
/Issu
esTh
omas
, et a
l.
2004
Can
ada
CR
ITIC
AL
A
PPR
AIS
AL
TO
OLS
MET
HO
DS
(SR
s)
To d
escr
ibe
four
issu
es
rela
ted
to s
yste
mat
ic
liter
atur
e re
view
s of
th
e ef
fect
iven
ess
of
publ
ic h
ealth
nur
sing
in
terv
entio
ns:
1. p
roce
ss o
f sys
tem
-at
ical
ly re
view
ing
the
liter
atur
e;
2. d
evel
opm
ent o
f a
qual
ity a
sses
smen
t in
stru
men
t;
3. th
e m
etho
ds/
resu
lts o
f the
EP
HP
P to
dat
e;
4. s
ome
met
hods
/re
sults
of t
he d
is-
sem
inat
ion
used
.
Dom
ains
for a
sses
sing
obs
erva
tiona
l stu
dies
:
stud
y qu
estio
n•
stud
y po
pula
tion
•
com
para
bilit
y of
sub
ject
s•
expo
sure
/inte
rven
tion
•
outc
ome
mea
sure
blin
ding
•
stat
istic
al a
naly
sis
•
resu
lts•
disc
ussi
on•
fund
ing
•
Com
pone
nts
of th
e E
PH
PP
Qua
lity
Ass
essm
ent T
ool:
sele
ctio
n bi
as•
desi
gn•
conf
ound
ers
•
blin
ding
•
data
col
lect
ion
met
hods
•
with
draw
als
and
drop
-out
s•
This
mod
el o
f sco
ring
has
been
test
ed a
s a
relia
ble
and
valid
tool
.
This
sco
ring
syst
em in
dica
tes
a pr
efer
ence
for R
CTs
an
d C
CTs
, whi
ch w
ill te
nd to
sco
re h
ighe
r tha
n no
n-ra
ndom
ized
stu
dies
.
The
best
sco
re th
at c
an b
e ac
hiev
ed b
y ob
serv
atio
nal
stud
ies
is “m
oder
ate,
” and
onl
y if
all o
ther
com
po-
nent
s of
the
stud
y ar
e ve
ry w
ell d
esig
ned.
Ram
say,
et a
l.
2003
US
A
CR
ITIC
AL
A
PPR
AIS
AL
TO
OLS
MET
HO
DS
(inte
rrup
ted
time
serie
s (IT
S)
desi
gns)
To c
ritic
ally
revi
ew th
e m
etho
dolo
gica
l qua
l-ity
of I
TS (i
nter
rupt
ed
time
serie
s) d
esig
ns
usin
g st
udie
s in
clud
ed
in tw
o sy
stem
atic
re
view
s (a
revi
ew o
f m
ass
med
ic in
terv
en-
tions
and
a re
view
of
guid
elin
e di
ssem
ina-
tion
and
impl
emen
ta-
tion
stra
tegi
es).
Qua
lity
crite
ria fo
r ITS
des
igns
:
Inte
rven
tion
occu
rred
inde
pend
ently
of o
ther
cha
nges
ove
r tim
e.•
Inte
rven
tion
was
unl
ikel
y to
affe
ct d
ata
colle
ctio
n.•
The
prim
ary
outc
ome
was
ass
esse
d bl
indl
y or
was
mea
sure
d ob
ject
ivel
y.•
The
prim
ary
outc
ome
was
relia
ble
or w
as m
easu
red
obje
ctiv
ely.
•
The
com
posi
tion
of th
e da
ta s
et a
t eac
h tim
e po
int c
over
ed a
t lea
st 8
0% o
f the
tota
l num
ber
•of
par
ticip
ants
in th
e st
udy.
Theshapeoftheinterventioneffectwaspre-specified.
•
A ra
tiona
le fo
r the
num
ber a
nd s
paci
ng o
f dat
a po
ints
was
des
crib
ed.
•
The
stud
y w
as a
naly
sed
appr
opria
tely
usi
ng ti
me
serie
s te
chni
ques
.•
Find
ings
:
A to
tal o
f 66%
(n =
38)
of I
TS s
tudi
es d
id n
ot ru
le o
ut th
e th
reat
that
ano
ther
eve
nt c
ould
•
have
occ
urre
d at
the
poin
t of i
nter
vent
ion.
Rep
ortin
g of
fact
ors
rela
ted
to d
ata
colle
ctio
n, th
e pr
imar
y ou
tcom
e an
d co
mpl
eten
ess
of
•da
tase
t wer
e ge
nera
lly d
one
in b
oth
revi
ews.
All
the
stud
ies
wer
e co
nsid
ered
“effe
ctiv
e” in
the
orig
inal
repo
rt, b
ut a
ppro
xim
atel
y ha
lf of
•there-analysedstudiesshow
ednostatisticallysignificantdifferences.
Inte
rrup
ted
time
serie
s de
sign
s ar
e of
ten
anal
ysed
in
appr
opria
tely,
und
erpo
wer
ed a
nd p
oorly
repo
rted
in
impl
emen
tatio
n re
sear
ch.
39
Stud
yPu
rpos
eM
etho
ds/R
esul
tsC
omm
ents
/Issu
esC
onn
& R
antz
2003
US
A
CR
ITIC
AL
A
PPR
AIS
AL
TO
OLS
MET
HO
DS
(qua
lity
and
stud
y ou
tcom
es)
1. T
o di
scus
s w
ays
that
rese
arch
ers
con-
ceiv
e of
and
ass
ess
qual
ity.
2. T
o de
term
ine
asso
-ci
atio
ns b
etw
een
stud
y qu
ality
and
out
com
es.
3. T
o de
term
ine
stra
te-
gies
for m
anag
ing
the
varie
d qu
ality
of
prim
ary
stud
ies
in a
m
eta-
anal
ysis
.
Ass
essi
ng m
etho
dolo
gica
l qua
lity:
No
sing
le s
tand
ard
exis
ts fo
r add
ress
ing
the
qual
ity v
aria
tions
in p
rimar
y st
udie
s.•
Tabl
e 1
– su
mm
ary
of c
omm
only
not
ed c
ompo
nent
s of
inte
rven
tion
rese
arch
qua
lity.
•
Inst
rum
ents
to m
easu
re q
ualit
y –
mor
e th
an 1
00 p
rimar
y st
udy
qual
ity s
cale
s ex
ist f
or
•m
easu
ring
the
qual
ity o
f prim
ary
stud
ies,
but
they
var
y dr
amat
ical
ly in
siz
e, c
ompo
sitio
n,
com
plex
ity a
nd e
xten
t of d
evel
opm
ent.
Qua
lity
mea
sure
men
t sca
les
have
dev
elop
men
tal a
nd a
pplic
atio
n pr
oble
ms.
•
Mos
t sca
les
resu
lt in
a s
ingl
e, o
vera
ll sc
ore
for q
ualit
y. O
nly
a fe
w s
cale
s co
ntai
n su
bsca
les
•thatprofilestrengthsandweaknesses.
Instrumentshaveprovendifficulttoapplyconsistently,eveninRCTs.
•
Rel
atio
nshi
p be
twee
n qu
ality
and
stu
dy o
utco
mes
:
Find
ings
hav
e of
ten
been
con
tradi
ctor
y.•
Thefindingsaremixedonwhetherlow-qualitystudiesunder-orover-estim
ateeffectsizes
•co
mpa
red
to h
igh
qual
ity s
tudi
es.
Diff
eren
t sca
les
gene
rate
div
erse
ass
essm
ents
of s
tudy
qua
lity,
whi
ch c
ause
inco
nsis
tenc
y •
in e
fforts
to re
late
stu
dy q
ualit
y to
out
com
e.
Stra
tegi
es to
man
age
qual
ity:
Set
min
imum
leve
ls fo
r inc
lusi
on o
r req
uire
that
cer
tain
qua
lity
attri
bute
s be
pre
sent
—ca
n •
invo
lve
setti
ng in
clus
ion
crite
ria, s
elec
ting
a pr
iori
parti
cula
r qua
lity-
scal
e cu
t-off
scor
es;
limitation:excludingresearchthatmaybeoflowerrigourgoesagainstthescientifichabitof
exam
inin
g da
ta, l
ettin
g th
e da
ta s
peak
.
Wei
ght e
ffect
siz
es b
y qu
ality
sco
res—
allo
ws
incl
usio
n of
div
erse
stu
dies
, but
relie
s on
•
ques
tiona
ble
qual
ity m
easu
res.
Con
side
r qua
lity
to b
e an
em
piric
al q
uest
ion—
can
exam
ine
asso
ciat
ions
bet
wee
n qu
ality
•
and
effe
ct s
izes
and
thus
pre
serv
e th
e pu
rpos
e of
met
a-an
alys
is to
sys
tem
atic
ally
exa
min
e data;limitation:problem
swithscalemeasuresofoverallqualitymaylimitconfidencein
findingsrelatedtooverallqualitymeasures.
Con
clus
ions
:
Res
earc
hers
are
incr
easi
ngly
com
bini
ng s
trate
-•
gies
to o
verc
ome
the
limita
tions
of u
sing
a s
ingl
e ap
proa
ch.
Furth
er w
ork
to d
evel
op v
alid
mea
sure
s of
•
prim
ary
stud
y qu
ality
dim
ensi
ons
will
impr
ove
the
abili
ty o
f met
a-an
alys
es to
info
rm re
sear
ch a
nd
nurs
ing
prac
tice.
40
Stud
yPu
rpos
eM
etho
ds/R
esul
tsC
omm
ents
/Issu
esLi
nde,
et a
l.
2002
Ger
man
y
STU
DIE
S
(ran
dom
ized
vs.
no
n-ra
ndom
ized
)
To in
vest
igat
e:
1. P
ossi
ble
diffe
r-en
ces
in p
atie
nts,
in
terv
entio
ns,
desi
gn-in
depe
nden
t qu
ality
asp
ects
, and
re
spon
se ra
tes
of
rand
omiz
ed v
s. n
on-
rand
omiz
ed s
tudi
es
of a
cupu
nctu
re fo
r ch
roni
c he
adac
he.
2. If
non
-ran
dom
ized
st
udie
s pr
ovid
e re
leva
nt a
dditi
onal
in
fo (i
.e.,
met
hods
/re
sults
on
long
-term
ou
tcom
es, p
rogn
ostic
fa
ctor
s, a
dver
se
effe
cts,
resp
onse
ra
tes)
.
3. P
ossi
ble
expl
ana-
tions
if re
spon
se
rate
s in
rand
omiz
ed
and
non-
rand
omiz
ed
patie
nts
diffe
r.
Qua
lity
crite
ria:
Isthesamplingmethodclearandclearlydescribed?
•
Isthereaclearheadachediagnosis?
•
Arepatientscharacterized(atleastage,sex,duration,severityofsym
ptom
s)?
•
Isthereatleastafour-weekbaselineperiod?
•
Arethereatleasttwoclinicalheadacheoutcom
es?
•
Isaheadachediaryused?
•
Areco-interventionsdescribed?
•
Areatleast90%
ofpatientsincludedanalysedaftertreatment?
•
Areatleast80%
ofpatientstreatedanalysedatearly(<
6months)follow-up?
•
Areatleast80%
ofpatientstreatedanalysedatlate(≥6months)follow-up?
•
Met
hods
/Res
ults
:
59studieswereincluded:24random
izedtrials,and35non-random
izedstudies(fivenon-
•ra
ndom
ized
con
trolle
d co
hort
stud
ies,
10
pros
pect
ive
unco
ntro
lled
stud
ies,
10
case
ser
ies
and
10 c
ross
sec
tiona
l sur
veys
).
10ofthe24RCTsand26ofthe35non-randomizedstudiesmetlessthanfivequality
•cr
iteria
.
A to
tal o
f 535
pat
ient
s re
ceiv
ed a
cupu
nctu
re tr
eatm
ent i
n th
e 24
RC
Ts c
ompa
red
to 2
695
in
•th
e 35
non
-ran
dom
ized
stu
dies
.
On
aver
age,
rand
omiz
ed tr
ials
had
sm
alle
r sam
ple
size
s, m
et m
ore
qual
ity c
riter
ia a
nd h
ad
•lo
wer
resp
onse
rate
s (0
.59;
0.4
8-0.
69) v
s. 0
.78
(0.7
2-0.
83).
Ran
dom
ized
or n
ot, s
tudi
es m
eetin
g m
ore
qual
ity c
riter
ia h
ad lo
wer
resp
onse
rate
s.•
Non-randomizedstudiesdidn’thavesignificantlylongerfollow-upperiods.O
fthethreethat
•in
clud
ed a
n an
alys
is o
f pro
gnos
tic v
aria
bles
, onl
y on
e re
porte
d on
adv
erse
effe
cts,
and
the
degr
ee o
f gen
eral
izab
ility
was
unc
lear
.
41
Stud
yPu
rpos
eM
etho
ds/R
esul
tsC
omm
ents
/Issu
esM
acLe
hose
, et a
l.
2000
UK
STU
DIE
S
(RC
Ts v
s. Q
uasi
-ex
perim
enta
l an
d ob
serv
atio
n (Q
EO
))
To c
ompa
re e
stim
ates
of
effe
ctiv
enes
s fro
m
RC
Ts v
s. Q
EO
s an
d ex
amin
e th
e as
soci
atio
n be
twee
n m
etho
dolo
gica
l qua
lity
and
mag
nitu
de o
f ef-
fect
iven
ess
estim
ates
.
Stra
tegy
1: C
ompa
re R
CT
and
QEO
stu
dy e
stim
ates
of e
ffect
iven
ess
of a
ny
inte
rven
tion,
whe
re b
oth
estim
ates
wer
e re
port
ed in
a s
ingl
e pa
per.
14 p
aper
s w
ith 3
8 co
mpa
rison
s w
ere
revi
ewed
; 25
wer
e of
low
qua
lity
and
13 w
ere
of h
igh
qual
ity.
Dis
crep
anci
es b
etw
een
RC
T an
d Q
EO
stu
dy e
stim
ates
of e
ffect
siz
e an
d ou
tcom
e fre
quen
cy
for i
nter
vent
ion
and
cont
rol g
roup
s w
ere
smal
ler f
or h
igh-
than
low
-qua
lity
com
paris
ons.
Con
clus
ion:
QE
O s
tudy
est
imat
es o
f effe
ctiv
enes
s m
ay b
e va
lid if
impo
rtant
con
foun
ding
fact
ors
are
•co
ntro
lled
for.
Low
qua
lity
com
paris
ons
tend
to h
ave
mor
e ex
trem
e Q
EO
stu
dy e
stim
ates
of e
ffect
. •
Met
hods
/resu
lts a
re li
mite
d by
the
num
ber o
f pap
ers
revi
ewed
(few
) and
pot
entia
lly u
nrep
-•
rese
ntat
ive
natu
re o
f evi
denc
e re
view
ed.
Stra
tegy
2: C
ompa
re R
CT
and
QEO
stu
dy e
stim
ates
of e
ffect
iven
ess
for
spec
ified
inte
rven
tions
, whe
re th
e es
timat
es w
ere
repo
rted
in d
iffer
ent
pape
rs.
Inte
rven
tions
: mam
mog
raph
ic s
cree
ning
(MS
BC
) of w
omen
to re
duce
mor
talit
y fro
m b
reas
t ca
ncer
; fol
ic a
cid
supp
lem
enta
tion
(FA
S) t
o pr
even
t neu
ral t
ube
defe
cts
in w
omen
tryi
ng to
co
ncei
ve.
34 p
aper
s w
ere
revi
ewed
(17
on M
SB
C, 1
7 on
FA
S).
Eightandfourpapers,respectively,wereindividuallyorclusterassignedRCTs,fiveandsix
wer
e no
n-ra
ndom
ized
tria
ls o
r coh
ort s
tudi
es, a
nd th
ree
and
six
wer
e m
atch
ed o
r unm
atch
ed
case
–con
trol s
tudi
es. T
wo
stud
ies,
one
of M
SB
C a
nd o
ne o
f FA
S, u
sed
som
e ot
her s
tudy
de
sign
.
Bot
h co
hort
and
case
-con
trol s
tudi
es h
ad lo
wer
tota
l qua
lity
scor
es th
an R
CTs
; coh
ort s
tudi
es
alsohadsignificantlylowerscoresthancase-controlstudies.
Met
a-re
gres
sion
of s
tudy
attr
ibut
es a
gain
st re
lativ
e ris
k es
timat
es s
how
ed n
o as
soci
atio
n be
twee
n ef
fect
siz
e an
d st
udy
qual
ity fo
r eith
er in
terv
entio
n.
Estimatesfrom
RCTsandcohortstudieswerenotsignificantlydifferent,butcase-controlstud-
iesgavesignificantlydifferentestimatesforbothMSBC(greaterbenefit)andFA
S(lessbenefit).
Rec
omm
enda
tions
:
Sta
ndar
ds fo
r rep
ortin
g qu
asi-e
xper
imen
tal a
nd
•ob
serv
atio
nal s
tudi
es s
houl
d be
dev
elop
ed.
Inno
vativ
e se
arch
stra
tegi
es s
houl
d be
exp
lore
d.•
Met
hods
for i
dent
ifyin
g st
udie
s th
at p
rovi
de a
•
dire
ct c
ompa
rison
of e
stim
ates
from
rand
omiz
ed
and
non-
rand
omiz
ed d
ata
are
need
ed.
42
Stud
yPu
rpos
eM
etho
ds/R
esul
tsC
omm
ents
/Issu
esB
ritto
n
1998
UK
STU
DIE
S
(RC
Ts v
s. n
on-
RC
Ts)
To e
xplo
re th
e is
sues
re
late
d to
the
proc
ess
of ra
ndom
izat
ion
that
m
ay a
ffect
the
valid
ity
of c
oncl
usio
ns d
raw
n fro
m th
e M
etho
ds/R
e-su
lts o
f RC
Ts a
nd n
on-
rand
omiz
ed s
tudi
es
Pre
viou
s co
mpa
rison
s of
RC
Ts a
nd n
on-r
ando
miz
ed s
tudi
es:
18 p
aper
s th
at d
irect
ly c
ompa
red
the
Met
hods
/Res
ults
of R
CTs
and
pro
spec
tive
non-
ran-
•do
miz
ed s
tudi
es w
ere
foun
d an
d an
alyz
ed
Nei
ther
the
RC
Ts n
or th
e no
nran
dom
ized
stu
dies
con
sist
ently
gav
e la
rger
or s
mal
ler e
sti-
•m
ates
of t
he tr
eatm
ent e
ffect
.
Thetypeofinterventiondidnotappeartobeinfluential
•
Exc
lusi
ons:
The
num
ber o
f elig
ible
sub
ject
s in
clud
ed in
the
RC
Ts ra
nged
from
1%
to 1
00%
. •
Rea
sons
for e
xclu
sion
s m
ay b
e m
edic
al (e
.g. h
igh
risk
of a
dver
se e
vent
s in
cer
tain
gro
ups)
•orscientific(selectingonly
smal
l hom
ogen
eous
gro
ups
in o
rder
to in
crea
se th
e pr
ecis
ion
of e
stim
ated
trea
tmen
t ef-
•fe
cts)
; als
o bl
anke
t exc
lusi
ons
are
com
mon
in R
CTs
Par
ticip
atio
n:
Mos
t RC
Ts fa
iled
to d
ocum
ent a
dequ
atel
y th
e ch
arac
teris
tics
of e
ligib
le in
divi
dual
s w
ho d
id
•no
t par
ticip
ate
in tr
ials
.
Adj
ustin
g fo
r bas
elin
e di
ffere
nces
:
in n
on-r
ando
miz
ed s
tudi
es: a
djus
tmen
t for
diff
eren
ces
ofte
n ch
ange
d th
e tre
atm
ent e
ffect
•sizebutnotsignificantly;importantly,thedirectionofchangewasinconsistent.
Con
clus
ions
:
Met
hods
/Res
ults
of R
CTs
and
non
-ran
dom
ized
•
stud
ies
do n
ot in
evita
bly
diffe
r
The
avai
labl
e ev
iden
ce s
uffe
rs fr
om m
any
•lim
itatio
ns b
ut it
sug
gest
ed th
at it
may
be
pos-
sibl
e to
min
imiz
e an
y di
ffere
nces
by
ensu
ring
that
sub
ject
s in
clud
ed in
eac
h ty
pe o
f stu
dy a
re
com
para
ble.
Lem
mer
, et a
l.
1999
UK
MET
HO
DS
(SR
s)
To re
port
on k
ey
problemsanddifficul-
ties
enco
unte
red
in a
sy
stem
atic
lite
ratu
re
revi
ew th
at, i
n th
e ab
-se
nce
of R
CTs
, dre
w
upon
theo
retic
al s
tud-
ies
of d
ecis
ion-
mak
ing
and
prac
tice-
base
d re
sear
ch.
A to
tal o
f 164
revi
ews
of 8
2 pu
blic
atio
ns w
ere
carr
ied
out.
AmodifiedversionoftheCochraneCollaborationProtocolw
asused:
Dat
abas
es a
nd k
eyw
ords
use
d fo
r sea
rch
stra
tegi
es w
ere
sele
cted
in c
onsu
ltatio
n w
ith a
•
libra
rian
spec
ializ
ing
in h
ealth
and
rela
ted
subj
ects
.
Om
itted
sec
tions
wer
e co
nsid
ered
inap
prop
riate
for r
epor
ts th
at w
ere
not R
CTs
, and
wer
e •
repl
aced
with
hea
ding
s fo
r a b
road
er ra
nge
of m
etho
dolo
gies
.
Ther
e w
as a
lso
a se
ctio
n at
the
end
of th
e pr
otoc
ol fo
r rev
iew
ers’
com
men
ts o
n th
e ap
prop
riate
-nessandsignificanceofeachreview
.
A nu
mer
ical
sco
re w
as g
iven
to e
ach
artic
le; 8
is th
e m
axim
um s
core
and
1 is
the
min
imum
sc
ore.
This
sca
le w
as u
sed
to in
dica
te th
e m
etho
dolo
gica
l rig
our a
nd re
leva
nce
of e
ach
artic
le.
Qua
litat
ive
stud
ies
prov
ide
little
info
rmat
ion
on m
eth-
odology,makingqualityassessm
entdifficult.•All
revi
ewer
s ag
reed
, how
ever
, tha
t som
e of
the
artic
les
that
wer
e ra
ted
poor
con
tain
ed im
porta
nt is
sues
or
sect
ions
who
se o
mis
sion
wou
ld b
e a
detri
men
t to
the
stud
y.
Reachingaconsensusscorewasdifficultasindi
-vi
dual
revi
ewer
s in
terp
rete
d ar
ticle
s an
d re
sear
ch
desi
gns
in d
iffer
ent w
ays.
43
Stud
yPu
rpos
eM
etho
ds/R
esul
tsC
omm
ents
/Issu
esG
olds
mith
, et a
l.
2007
UK
MET
HO
DS
To d
escr
ibe
the
revi
ew
met
hods
dev
elop
ed
andthedifficulties
enco
unte
red
dur-
ing
the
proc
ess
of
upda
ting
a sy
stem
atic
re
view
of e
vide
nce
to
info
rm g
uide
lines
for
the
cont
ent o
f pat
ient
in
form
atio
n re
late
d to
ce
rvic
al s
cree
ning
.
Stu
dies
of w
omen
in a
n or
gani
zed,
sys
tem
atic
cer
vica
l scr
eeni
ng p
rogr
am w
ere
incl
uded
; all
stud
y de
sign
s, e
xcep
t for
opi
nion
pie
ces,
wer
e in
clud
ed.
233
stud
ies
wer
e re
triev
ed fo
r fur
ther
eva
luat
ion.
Dat
a ex
tract
ion
was
con
duct
ed fo
r 79
stud
ies
(52
quan
titat
ive
and
27 q
ualit
ativ
e).
32 s
tudi
es w
ere
repo
rted
as re
leva
nt.
13 q
ualit
ativ
e st
udie
s, s
even
non
-com
para
tive
desc
riptiv
e st
udie
s, th
ree
RC
Ts, t
hree
qua
si-
RC
Ts, t
hree
cro
ss s
ectio
nal s
tudi
es a
nd o
ne e
ach
of c
lust
er R
CT,
retro
spec
tive
case
con
trol
stud
y, a
nd n
on-c
ompa
rativ
e tim
e se
ries
stud
y w
ere
incl
uded
in th
e re
port.
The
qual
ity o
f eac
h in
divi
dual
stu
dy w
as a
sses
sed
usin
g es
tabl
ishe
d ch
eckl
ists
: SIG
N, C
AS
P,
NZG
G, U
KG
CS
RO
and
a re
view
er c
heck
list.
The
basi
c pr
inci
ples
of t
he G
RA
DE
app
roac
h w
ere
appl
ied
to th
e sy
nthe
sis
of th
e qu
antit
ativ
e ev
iden
ce.
Con
clus
ions
:
The
revi
ew q
uest
ions
wer
e be
st a
nsw
ered
by
•ev
iden
ce fr
om a
rang
e of
dat
a so
urce
s.
Qua
litat
ive
rese
arch
was
ofte
n hi
ghly
rele
vant
•andspecifictomanycomponentsofthescreen
-in
g in
form
atio
n m
ater
ials
.
DeW
alt,
et a
l.
2004
US
A
MET
HO
DS
To re
view
the
rela
tion-
ship
bet
wee
n lit
erac
y an
d he
alth
out
com
es.
Rev
iew
of S
tudi
es –
ex
amin
es o
bser
va-
tiona
l stu
dies
that
re
porte
d or
igin
al d
ata
mea
surin
g lit
erac
y w
ith
any
valid
inst
rum
ent
and
mea
sure
d on
e or
m
ore
heal
th o
utco
mes
.
Incl
uded
stu
dies
with
out
com
es re
late
d to
hea
lth a
nd h
ealth
ser
vice
and
mea
sure
men
t of
liter
acy
skill
s w
ith a
val
id in
stru
men
t (W
est e
t al.,
200
2).
Gra
ded
each
stu
dy a
ccor
ding
to:
adeq
uacy
of s
tudy
pop
ulat
ion
•
com
para
bilit
y of
sub
ject
s•
valid
ity a
nd re
liabi
lity
of th
e lit
erac
y m
easu
rem
ent
•
mai
nten
ance
of c
ompa
rabl
e gr
oups
•
appr
opria
tene
ss o
f the
out
com
e m
easu
rem
ent
•
appr
opria
tene
ss o
f sta
tistic
al a
naly
sis
•
adeq
uacy
of c
ontro
l of c
onfo
undi
ng
•
The
aver
age
qual
ity o
f the
incl
uded
stu
dies
was
fair
to g
ood.
The
auth
ors
foun
d th
at m
ost s
tudi
es fa
iled
to a
d-eq
uate
ly a
ddre
ss c
onfo
undi
ng a
nd th
e us
e of
mul
tiple
co
mpa
rison
s.
Thom
son,
et a
l.
2006
UK
MET
HO
DS
To s
ynth
esiz
e da
ta o
n th
e ke
y so
cioe
cono
mic
de
term
inan
ts o
f hea
lth
and
heal
th in
equa
litie
s re
porte
d in
eva
luat
ions
of
the
natio
nal U
K
rege
nera
tion
prog
ram
.
Incl
uded
eva
luat
ions
that
repo
rted
achi
evem
ents
dra
win
g on
dat
a fro
m a
t lea
st tw
o ta
rget
are
as
of a
nat
iona
l urb
an re
gene
ratio
n pr
ogra
m, o
r are
a-ba
sed
initi
ativ
es (A
BIs
) in
the
UK
.
Nin
etee
n ev
alua
tions
repo
rted
impa
cts
on h
ealth
or s
ocio
econ
omic
det
erm
inan
ts o
f hea
lth. D
ata
was
syn
thes
ized
from
10
eval
uatio
ns.
Sta
ndar
d sy
stem
atic
revi
ew p
roce
dure
s w
ere
used
, inc
ludi
ng:
com
preh
ensi
ve s
earc
h st
rate
gy;
•
a pr
iori
incl
usio
n/ex
clus
ion
crite
ria;
•
two
peop
le in
depe
nden
tly re
view
ing
artic
les;
•
data
ext
ract
ion;
•
data
syn
thes
is.
•
Met
hodo
logi
cal s
hortc
omin
gs o
f the
prim
ary
stud
ies
chal
leng
ed th
e da
ta s
ynth
esis
pro
cess
.
The
auth
ors
sugg
este
d th
at in
terv
entio
ns a
nd
prog
ram
s at
tach
ed to
a th
eore
tical
fram
ewor
k (e
.g.,
theo
ry o
f cha
nge)
will
hel
p to
bui
ld e
vide
nce.
44
Stud
yPu
rpos
eM
etho
ds/R
esul
tsC
omm
ents
/Issu
esS
tein
, et a
l.
2005
US
A
MET
HO
DS
(hea
lth te
chno
l-og
y as
sess
men
ts
(HTA
s))
To in
vest
igat
e th
e as
soci
atio
n be
twee
n fre
quen
cy a
nd m
eth-
odol
ogic
al c
hara
c-te
ristic
s in
a s
ampl
e of
hea
lth te
chno
logy
as
sess
men
ts.
The
rese
arch
ers
incl
uded
revi
ews
on fo
ur in
terv
entio
ns in
thre
e re
ports
.
Dat
a w
as e
xtra
cted
from
pub
lishe
d an
d un
publ
ishe
d re
ports
and
was
ext
ract
ed b
y on
e re
view
er
and
chec
ked
by a
sec
ond.
Vario
us re
gres
sion
ana
lysi
s te
sts
wer
e un
derta
ken
on th
e da
ta.
The
stud
y ou
tcom
e m
easu
res
wer
e re
porte
d at
diff
eren
t tim
e pe
riods
, dep
endi
ng o
n th
e le
ngth
of
follo
w-u
p of
the
entir
e st
udy.
The
revi
ew lo
oked
at:
sam
ple
size
•
pros
pect
ive/
retro
spec
tive
appr
oach
•
mul
ti- o
r sin
gle-
cent
re o
rgan
izat
ions
•
cons
ecut
ive
recr
uitm
ent
•
inde
pend
ence
of o
utco
me
mea
sure
•
leng
th o
f fol
low
-up
•
publ
icat
ion
date
•
Find
ings
:
Littl
e ev
iden
ce w
as fo
und
of a
n as
soci
atio
n be
twee
n m
etho
dolo
gica
l cha
ract
eris
tics
and
•ou
tcom
e.
Sam
ple
size
and
pro
spec
tive
appr
oach
wer
e no
t sho
wn
to b
e as
soci
ated
with
out
com
e.•
All
outc
omes
wer
e su
rgic
al in
terv
entio
ns, w
hich
may
im
pact
gen
eral
izab
ility
.
The
smal
l num
ber o
f cas
es a
nd li
mite
d nu
mbe
r of
stud
ies
in e
ach
set o
f cas
e se
ries
may
lim
it th
e pr
eci-
sion
and
gen
eral
izab
ility
.
Cas
e se
ries
may
be
parti
cula
rly p
rone
to p
ublic
atio
n bi
as.
45
Stud
yPu
rpos
eM
etho
ds/R
esul
tsC
omm
ents
/Issu
esD
alzi
el, e
t al.
2005
UK
MET
HO
DS
(cas
e se
ries
in
HTA
s)
STU
DIE
S
(cas
e se
ries
vs.
RC
Ts)
1. T
o re
view
the
use
of c
ase
serie
s (C
S) i
n N
atio
nal I
nstit
ute
for
Clin
ical
Exc
elle
nce
(NIC
E) H
TA re
ports
.
2. T
o sy
stem
atic
ally
re
view
the
met
hod-
olog
ical
lite
ratu
re fo
r pa
pers
rela
ting
to th
e va
lidity
of a
spec
ts o
f ca
se s
erie
s de
sign
.
3. T
o in
vest
igat
e ch
ar-
acteristicsandfindings
of c
ase
serie
s us
ing
exam
ples
from
the
UK
’s H
TA p
rogr
am
Rev
iew
of u
se o
f cas
e se
ries
in N
ICE
HTA
s:
Of 4
7 co
mpl
eted
HTA
s, 1
4 in
clud
ed in
form
atio
n fro
m C
Ss.
Incl
usio
n cr
iteria
for C
Ss
incl
uded
stu
dy s
ize
and
leng
th o
f fol
low
-up.
Ther
e w
as n
o co
nsen
sus
on w
hich
CS
s to
incl
ude
in H
TAs,
how
to u
se th
em o
r how
to a
sses
s th
eir q
ualit
y.
Sys
tem
atic
revi
ew o
f met
hodo
logi
cal l
itera
ture
:
Asearchwasconductedtofindstudiesthatassessedaspectsofcaseseriesdesign,analysis
or q
ualit
y in
rela
tion
to s
tudy
val
idity
.
No
em
piric
al s
tudi
es w
ere
foun
d.
Investigationofcharacteristicsandfindingsofcaseseriesdesigns:
No
rela
tions
hip
was
foun
d be
twee
n sa
mpl
e si
ze a
nd o
utco
me
frequ
ency
or b
etw
een
pros
pec-
tive
data
col
lect
ion
and
outc
ome
frequ
ency
.
Oneanalysiseach(indifferenttopicareas)foundasignificantassociationbetweenmulti-centre
stud
ies
and
outc
ome,
bet
wee
n in
depe
nden
t out
com
e m
easu
rem
ent a
nd o
utco
me
frequ
ency
an
d be
twee
n ea
rlier
pub
licat
ion
and
outc
ome
frequ
ency
.
Lengthoffollow-upwasfoundtobesignificantlyassociatedwithoutcomefrequencyinthree
anal
yses
.
Com
paris
on b
etw
een
case
ser
ies
and
RC
Ts:
Com
pare
d w
ith R
CT
evid
ence
, whi
ch s
how
ed n
o di
ffere
nce
betw
een
PTC
A an
d C
AB
G, c
ase
serie
s es
timat
es o
f mor
talit
y sh
owed
a 1
–2%
incr
ease
in m
orta
lity
for C
AB
G.
For a
ngin
a re
curr
ence
, nei
ther
cas
e se
ries
nor R
CT
data
sho
wed
any
diff
eren
ce b
etw
een
the
two
inte
rven
tions
.
The
incl
uded
stu
dies
wer
e al
l sur
gica
l int
erve
ntio
ns
whi
ch m
ight
lim
it ge
nera
lizab
ility
to o
ther
inte
rven
-tio
ns o
r set
tings
.
The
stud
y dr
ew o
n a
smal
l num
ber o
f cas
es, t
here
-fo
re re
sults
sho
uld
be v
iew
ed w
ith c
autio
n.
46
Stud
yPu
rpos
eM
etho
ds/R
esul
tsC
omm
ents
/Issu
esD
eeks
, et a
l.
2003
UK
CR
ITIC
AL
A
PPR
AIS
AL
TO
OLS
(eva
luat
e ab
ility
of
case
-mix
adj
ust-
men
t to
cont
rol f
or
bias
)
To c
onsi
der m
etho
ds
and
rela
ted
evid
ence
fo
r eva
luat
ing
bias
in
non
-ran
dom
ized
in
terv
entio
n st
udie
s.
Eig
ht s
tudi
es c
ompa
red
the
resu
lts o
f ran
dom
ized
and
non
-ran
dom
ized
stu
dies
acr
oss
mul
tiple
in
terv
entio
ns.
Atotalof194toolswereidentifiedforassessingnon-randomizedstudies.
Find
ings
:
The
resu
lts fr
om n
on-r
ando
miz
ed s
tudi
es s
omet
imes
diff
er fr
om th
e re
sults
of r
ando
miz
ed
•st
udie
s of
the
sam
e in
terv
entio
n.
Man
y qu
ality
ass
essm
ent t
ools
exi
st, b
ut a
ppra
isal
of s
tudi
es o
mits
key
qua
lity
dom
ains
.•
Non
-ran
dom
ized
stu
dies
sho
uld
only
be
unde
rtake
n w
hen
RC
Ts a
re n
ot fe
asib
le.
•
Kat
rak,
et a
l.
2004
Aus
tralia
CR
ITIC
AL
A
PPR
AIS
AL
TO
OLS
To s
umm
ariz
e co
nten
t, in
tent
, con
stru
ctio
n an
d ps
ycho
met
-ric
pro
perti
es o
f pu
blis
hed,
cur
rent
ly
avai
labl
e cr
itica
l ap-
prai
sal t
ools
to id
entif
y co
mm
on e
lem
ents
and
th
eir r
elev
ance
to a
l-lie
d he
alth
rese
arch
.
A to
tal o
f 121
pub
lishe
d cr
itica
l app
rais
al to
ols
wer
e in
clud
ed in
the
SR
, sou
rced
from
108
pa
pers
.
Find
ings
:
87%oftoolswerespecifictoresearchdesign,withmosttoolshavingbeendevelopedfor
•ex
perim
enta
l stu
dies
(38%
of a
ll to
ols
sour
ced)
.
Ther
e w
as g
reat
var
iabi
lity
in it
ems
cont
aine
d in
the
criti
cal a
ppra
isal
tool
s.•
12%(n=14instruments)ofavailabletoolsweredevelopedusingspecifiedempirical
•re
sear
ch.
49%
of t
he to
ols
sum
mar
ized
qua
lity
appr
aisa
l int
o a
num
eric
sum
mar
y sc
ore.
•
Few
tool
s ha
d do
cum
ente
d ev
iden
ce o
f val
idity
of t
heir
item
s, o
r rel
iabi
lity
of u
se.
•
Gui
delin
es re
gard
ing
adm
inis
tratio
n of
tool
s w
ere
prov
ided
in 4
3% (n
= 5
2) o
f cas
es.
•
Con
clus
ions
:
Ther
e is
no
“gol
d st
anda
rd” c
ritic
al a
ppra
isal
tool
•
for a
ny s
tudy
des
ign,
nor
is th
ere
any
wid
ely
ac-
cept
ed g
ener
ic to
ol th
at c
an b
e ap
plie
d eq
ually
-w
ell a
cros
s st
udy
type
s.
Con
sum
ers
of re
sear
ch s
houl
d ca
refu
lly s
elec
t •
tool
s.
Sel
ecte
d to
ols
shou
ld h
ave
publ
ishe
d ev
iden
ce
•of
em
piric
al b
asis
for t
heir
cons
truct
ion,
val
idity
of
item
s an
d re
liabi
lity
of in
terp
reta
tion,
and
gu
idel
ines
for u
se.
47
Stud
yPu
rpos
eM
etho
ds/R
esul
tsC
omm
ents
/Issu
esG
reen
halg
h, e
t al.
2005
UK
MET
HO
DS
(met
a-na
rrat
ive
revi
ew)
To in
trodu
ce/d
escr
ibe
and
repo
rt on
the
deve
lopm
ent o
f a n
ew
met
hod:
met
a-na
rra-
tive
revi
ew.
Met
a-na
rrat
ive
revi
ew w
as d
evel
oped
as
the
met
hodo
logi
cal b
ase
for t
he s
ynth
esis
of e
vide
nce
acrossmultipledisciplinaryfields.Ithasparticularstrengthsasasynthesismethodwhen:
the
scop
e of
a p
roje
ct is
bro
ad a
nd th
e lit
erat
ure
dive
rse;
•
diffe
rent
gro
ups
of s
cien
tists
hav
e as
ked
diffe
rent
que
stio
ns a
nd u
sed
diffe
rent
rese
arch
•
desi
gns
to a
ddre
ss a
com
mon
pro
blem
;
‘quality’papershavedifferentdefiningfeaturesindifferentliterature;
•
ther
e is
no
univ
ersa
lly a
gree
d-up
on p
roce
ss fo
r pul
ling
the
diffe
rent
bod
ies
of li
tera
ture
•
toge
ther
.
The
stud
y in
volv
ed a
sys
tem
atic
map
ping
pha
se to
col
lect
and
com
pare
the
diffe
rent
ove
r-ar
chin
g st
oryl
ines
of t
he ri
se a
nd fa
ll of
diff
usio
n re
sear
ch th
at is
judg
ed re
leva
nt to
the
over
all
rese
arch
que
stio
n.
Pha
ses
of m
eta-
narr
ativ
e re
view
:
plan
ning
•
sear
chin
g•
map
ping
•
appr
aisa
l•
synt
hesi
s•
reco
mm
enda
tions
•
Five
key
prin
cipl
es u
nder
pin
the
met
a-na
rrat
ive
tech
niqu
e: p
ragm
atis
m, p
lura
lism
, his
toric
ity,
cont
esta
tion
and
peer
revi
ew.
The
met
hod
shou
ld b
e te
sted
pro
spec
tivel
y.
Its c
ontri
butio
n to
the
mix
ed e
cono
my
of m
etho
ds fo
r th
e sy
stem
atic
revi
ew o
f com
plex
evi
denc
e sh
ould
be
expl
ored
furth
er.
Whi
ttem
ore
&
Knafl
2005
US
A
MET
HO
DS
(inte
grat
ive
revi
ew)
To d
istin
guis
h th
e in
te-
grat
ive
revi
ew m
etho
d fro
m o
ther
revi
ew
met
hods
and
to p
ro-
pose
met
hodo
logi
cal
strategiesspecificto
the
inte
grat
ive
revi
ew
met
hod
to e
nhan
ce
rigou
r.
Issu
es d
escr
ibed
and
stra
tegi
es u
sed
to e
nhan
ce th
e rig
our o
f the
inte
grat
ive
revi
ew m
etho
d in
nu
rsin
g:
Well-specifiedreview
purposeandvariablesofinterest—
helptoaccuratelyoperationalize
•va
riabl
es a
nd e
xtra
ct a
ppro
pria
te d
ata
from
prim
ary
sour
ces.
Well-definedliteraturesearchstrategies—identifythemaximum
num
berofeligibleprim
ary
•so
urce
s, u
sing
at l
east
two
to th
ree
stra
tegi
es.
Dat
a ev
alua
tion
stag
e—si
nce
inte
grat
ive
revi
ew m
etho
d in
clud
es d
iver
se p
rimar
y so
urce
s,
•th
e co
mpl
exity
of e
valu
atin
g th
e qu
ality
incr
ease
s; h
ow q
ualit
y is
eva
luat
ed w
ill v
ary
depe
ndin
g on
the
sam
plin
g fra
me;
in a
revi
ew w
ith a
div
erse
sam
plin
g fra
me
incl
usiv
e of
em
piric
al a
nd th
eore
tical
sou
rces
, an
appr
oach
sim
ilar t
o hi
stor
ical
rese
arch
to e
valu
ate
qual
ity m
ay b
e ap
prop
riate
.
Dat
a re
duct
ion—
divi
de th
e pr
imar
y so
urce
s in
to s
ubgr
oups
acc
ordi
ng to
som
e lo
gica
l •
syst
em to
faci
litat
e an
alys
is.
Datadisplay,datacomparison,conclusiondraw
ingandverification,presentation.
•
The
inte
grat
ive
revi
ew m
etho
d al
low
s fo
r the
com
bi-
natio
n of
div
erse
met
hodo
logi
es (e
.g.,
expe
rimen
tal
and
non-
expe
rimen
tal r
esea
rch)
and
has
the
pote
ntia
l to
pla
y a
grea
ter r
ole
in e
vide
nce-
base
d pr
actic
e.
48
Stud
yPu
rpos
eM
etho
ds/R
esul
tsC
omm
ents
/Issu
esN
orris
& A
tkin
s
2005
US
A
MET
HO
DS
(SR
s)
To e
xam
ine
the
use
of
non-
rand
omiz
ed s
tud-
ies
in e
vide
nce-
base
d pr
actic
e ce
nter
(EP
C)
repo
rts, a
ddre
ssin
g qu
estio
ns o
f the
effe
c-tiv
enes
s of
trea
tmen
t in
terv
entio
ns.
Of t
he 1
07 E
PC
repo
rts re
leas
ed b
etw
een
Feb
1999
and
Sep
t 200
4, 7
8 ex
amin
ed a
t lea
st o
ne
questionofefficacyoreffectivenessofaclinicalintervention.
49 o
f the
repo
rts in
clud
ed e
vide
nce
from
stu
dy d
esig
ns o
ther
than
RC
Ts; t
hese
repo
rts
exam
ined
pha
rmac
othe
rapy
, med
ical
dev
ises
, sur
gery
, com
plem
enta
ry a
nd a
ltern
ativ
e an
d be
havi
oura
l int
erve
ntio
ns.
Cha
lleng
es in
usi
ng n
on-r
ando
miz
ed s
tudi
es in
sys
tem
atic
revi
ews:
Term
inol
ogy
to d
escr
ibe
diffe
rent
non
- ran
dom
ized
stu
dy d
esig
ns is
inco
nsis
tent
in c
linic
al
rese
arch
lite
ratu
re a
nd E
PC
repo
rts.
Ther
e ar
e no
est
ablis
hed
guid
elin
es a
s to
whe
n no
n-ra
ndom
ized
stu
dies
can
or s
houl
d be
co
nsid
ered
for i
nclu
sion
in s
yste
mat
ic re
view
s, o
r wha
t stu
dy d
esig
ns to
con
side
r.
Itisdifficulttoassessthequalityofnon-randomizedstudies.
Of 4
9 re
ports
that
incl
uded
non
-ran
dom
ized
stu
dy d
esig
ns:
12 (2
5%) d
idn’
t ass
ess
stud
y qu
ality
;•
16%
use
d a
chec
klis
t or s
corin
g sy
stem
;•
10%
ada
pted
a p
revi
ousl
y pu
blis
hed
inst
rum
ent;
•
the
rem
aind
er (4
9%) u
sed
inst
rum
ents
that
the
revi
ewer
s ha
d de
velo
ped
them
selv
es.
•
Rec
omm
enda
tions
for s
yste
mat
ic re
view
ers:
Ass
ess
the
avai
labi
lity
of R
CTs
bef
ore
dete
rmin
-•ingfinalinclusioncriteria.
Con
side
r the
pro
s an
d co
ns o
f non
-RC
T de
sign
s.•
Pro
vide
a ra
tiona
le fo
r dec
isio
ns to
incl
ude/
ex-
•cludespecificstudydesigns.
Ass
ess
impo
rtant
dom
ains
of s
tudy
qua
lity.
•
Dis
cuss
how
incl
udin
g va
rious
stu
dy d
esig
ns m
ay
•af
fect
con
clus
ions
.
Dis
cuss
how
the
qual
ity o
f stu
dies
and
the
body
•
of e
vide
nce
may
affe
ct c
oncl
usio
ns.
Rec
omm
enda
tions
for r
esea
rche
rs:
Min
imiz
e so
urce
s of
bia
s re
gard
less
of s
tudy
•
desi
gn.
Exa
min
e th
e ef
fect
s of
var
ious
sou
rces
of b
ias
on
•m
easu
red
outc
omes
.
Use
con
sist
ent t
erm
inol
ogy
for s
tudy
des
ign.
•
Dev
ise
com
preh
ensi
ve s
earc
h st
rate
gies
for n
on-
•ra
ndom
ized
stu
dy d
esig
ns.
Jack
son
& W
ater
s
2005
Aus
tralia
MET
HO
DS
(SR
s)
To p
rovi
de re
com
men
-da
tions
to re
view
ers
on th
e is
sues
to
addr
ess
with
in a
pub
lic
heal
th s
yste
mat
ic
revi
ew a
nd to
indi
rect
ly
prov
ide
advi
ce to
re
sear
cher
s on
the
repo
rting
requ
irem
ents
of
prim
ary
stud
ies
for
the
prod
uctio
n of
hig
h qu
ality
sys
tem
atic
re
view
s.
To e
nsur
e re
view
s m
eet t
he n
eeds
of u
sers
, est
ablis
h an
adv
isor
y gr
oup
(mem
bers
sho
uld
be
fam
iliar
with
topi
c ar
ea, u
sefu
l to
incl
ude
pers
pect
ives
of p
olic
y m
aker
s, fu
nder
s, p
ract
ition
ers,
po
tent
ial u
sers
).
Issu
es to
add
ress
whe
n re
view
ing
publ
ic h
ealth
inte
rven
tions
:
incl
usio
n of
stu
dy d
esig
ns•
sear
ch fo
r pub
lic h
ealth
lite
ratu
re•
qual
ity a
sses
smen
t•
theo
retic
al fr
amew
orks
for i
nter
vent
ion
•
inte
grity
of i
nter
vent
ions
•
hete
roge
neity
•
inte
grat
ion
of q
ualit
ativ
e an
d qu
antit
ativ
e st
udie
s•
ethi
cs a
nd in
equa
litie
s –
heal
th in
terv
entio
ns e
ffect
ive
in re
duci
ng in
equa
litie
s an
d im
prov
-•
ing
heal
th o
f mar
gina
lized
/ dis
adva
ntag
ed
Rec
omm
enda
tions
:
Use
the
Qua
lity
Ass
essm
ent T
ool f
or Q
uant
itativ
e •
Stu
dies
dev
elop
ed b
y Th
omas
et a
l., 2
004.
49
Stud
yPu
rpos
eM
etho
ds/R
esul
tsC
omm
ents
/Issu
esA
tkin
s &
Di-
Gui
sepp
i
1998
US
A
MET
HO
DS
(res
earc
h on
pr
even
tive
heal
th
care
)
Dra
win
g on
exp
erie
nc-
es fr
om th
e U
S P
re-
vent
ive
Ser
vice
s Ta
sk
Forc
e, to
out
line
som
e m
ajor
are
as w
here
re
sear
ch is
nee
ded
to
definetheappropri-
ateuseofspecific
prev
entiv
e se
rvic
es
(e.g
., sc
reen
ing
test
s,
coun
selli
ng in
terv
en-
tions
, im
mun
izat
ions
, ch
emop
roph
ylax
is).
Obs
erva
tiona
l stu
dies
:
help
to e
stab
lish
linka
ges
in c
ausa
l pat
hway
;•
help
to u
nder
stan
d na
tura
l his
tory
of d
isea
se a
nd id
entif
y ris
k fa
ctor
s, m
easu
ring
com
pli-
•an
ce w
ith a
nd a
dver
se e
ffect
s of
trea
tmen
ts;
help
to d
eter
min
e ac
cura
cy o
f dia
gnos
tic te
sts;
•
helptoassessefficacyofinterventions;
•
have
an
inhe
rent
bia
s.•
Rec
omm
enda
tions
bas
ed s
olel
y on
obs
erva
tiona
l evi
-de
nce
requ
ire h
igh-
qual
ity s
tudi
es s
how
ing
cons
iste
nt
and
pref
erab
ly la
rge
effe
cts.
Whe
re o
nly
a fe
w o
bser
vatio
nal s
tudi
es o
f ade
quat
e qu
ality
are
ava
ilabl
e or
effe
ct s
izes
are
mod
est o
r in-
cons
iste
nt, a
dditi
onal
stu
dies
in d
iffer
ent p
opul
atio
ns
wou
ld b
e a
valu
able
add
ition
to th
e lit
erat
ure.
Har
den,
et a
l.
2004
UK
MET
HO
DS
(vie
ws
stud
ies)
To d
escr
ibe
the
met
hods
dev
elop
ed
for r
evie
win
g re
sear
ch
on p
eopl
e’s
pers
pec-
tives
and
exp
erie
nces
(‘‘
view
s’’ s
tudi
es)
alon
gsid
e tri
als,
with
in
a se
ries
of re
view
s on
yo
ung
peop
le’s
men
tal
heal
th, p
hysi
cal a
ctiv
-ity
and
hea
lthy
eatin
g.
Two
type
s of
stu
dies
wer
e re
view
ed:
1. In
terv
entio
n st
udie
s –
to id
entif
y ef
fect
ive,
inef
fect
ive
and
harm
ful i
nter
vent
ions
.
2. N
on-in
terv
entio
n st
udie
s –
to d
escr
ibe
fact
ors
asso
ciat
ed w
ith m
enta
l hea
lth, p
hysi
cal a
ctiv
-ity
and
hea
lthy
eatin
g.
“Vie
ws”
stu
dies
:
35 m
et th
e in
clus
ion
crite
ria.
•
Thestudiesvariedinthemethodsused—
manycouldnoteasilybeclassifiedas‘‘qualitative’’or
‘‘qua
ntita
tive.
’’
Mos
t fai
led
to m
eet s
even
bas
ic m
etho
dolo
gica
l rep
ortin
g st
anda
rds
used
in a
new
ly d
evel
oped
qu
ality
ass
essm
ent t
ool.
Thebenefitsofbringingtogetherviewstudiesinasystematicwayinclude:
gain
ing
a gr
eate
r bre
adth
of p
ersp
ectiv
es a
nd a
dee
per u
nder
stan
ding
of p
ublic
hea
lth is
-•
sues
from
the
poin
t of v
iew
of t
hose
targ
eted
by
inte
rven
tions
;
helpingreflectonstudymethodsthatmaydistort,misrepresentorfailtopickuppeople’s
•vi
ews;
crea
ting
grea
ter o
ppor
tuni
ties
for p
eopl
e’s
own
pers
pect
ives
and
exp
erie
nces
to in
form
•
polic
ies
to p
rom
ote
thei
r hea
lth.
Em
ergi
ng fr
amew
ork
repo
rted
in s
ubse
quen
t pub
lica-
tion
(Oliv
er, e
t al.,
200
5)
50
Stud
yPu
rpos
eM
etho
ds/R
esul
tsC
omm
ents
/Issu
esW
ong
& R
aabe
1996
US
A
MET
HO
DS
(met
a-re
view
)
To p
rovi
de s
ome
basi
c gu
idel
ines
for
non-
epid
emio
logi
sts
to e
valu
ate
met
a-an
al-
yses
in o
ccup
atio
nal
coho
rt st
udie
s.
Som
e pr
oble
ms
and
limita
tions
of t
radi
tiona
l qua
litat
ive
revi
ews:
Sel
ectio
n of
stu
dies
was
usu
ally
not
sub
ject
to p
re-d
eter
min
ed c
riter
ia.
•
Wei
ghts
ass
igne
d to
indi
vidu
al s
tudi
es w
ere
usua
lly s
ubje
ctiv
e.•
No
quan
titat
ive
sum
mar
ies
of ri
sk e
stim
ates
wer
e do
ne.
•
Find
ings
:
Rat
her t
han
repl
acin
g qu
alita
tive
revi
ews,
qua
ntita
tive
met
a-an
alys
is s
houl
d be
mad
e pa
rt •
of th
e ov
eral
l ass
essm
ent.
The
term
“met
a-re
view
” is
prop
osed
to e
mph
asiz
e th
e im
porta
nce
of b
oth
qual
itativ
e an
d •
quan
titat
ive
com
pone
nts
in a
com
preh
ensi
ve re
view
pro
cess
, mea
ning
it in
clud
es a
met
a-an
alys
is.
Bas
ic s
teps
in c
ondu
ctin
g a
met
a-re
view
:
1.Definetheresearchquestion.
2. C
ondu
ct a
lite
ratu
re s
earc
h.
3. D
eter
min
e cr
iteria
for i
nclu
sion
.
4. C
ondu
ct a
trad
ition
al q
ualit
ativ
e re
view
.
5. C
ondu
ct a
qua
ntita
tive
met
a-an
alys
is.
6. In
tegr
ate
tradi
tiona
l qua
litat
ive
revi
ew w
ith q
uant
itativ
e m
eta-
anal
ysis
.
7. A
pply
crit
eria
for c
ausa
tion
in in
terp
reta
tion.
Benefitsofm
eta-review
:
usef
ul in
sel
ectin
g st
udie
s, in
org
aniz
ing,
pre
sent
-•
ing
and
sum
mar
izin
g m
etho
ds/ r
esul
ts fr
om
indi
vidu
al s
tudi
es;
can
also
be
used
to d
etec
t het
erog
enei
ty a
mon
g •
stud
ies.
Majorbenefitsofconductingameta-analysisinclude:
enha
ncin
g st
atis
tical
pow
er (e
spec
ially
whe
n •
orig
inal
stu
dies
are
sm
all);
prov
idin
g a
prop
erly
wei
ghte
d su
mm
ary
risk
•es
timat
e;
taki
ng c
onsi
sten
cy o
f stu
dy a
nd m
etho
ds/re
sults
•
into
con
side
ratio
n;
min
imiz
ing
the
prob
lem
of m
ultip
le c
ompa
rison
s;•
exam
inin
g da
ta fo
r het
erog
enei
ty.
•
San
delo
wsk
i, et
al
.
2007
US
A
MET
HO
DS
(met
a-su
mm
ary)
To a
ddre
ss th
e ch
al-
leng
e of
man
agin
g th
e di
ffere
nces
pre
sum
ed
to e
xist
bet
wee
n qu
ali-
tativ
e an
d qu
antit
ativ
e re
sear
ch, a
nd a
dvan
ce
the
use
of q
ualit
ativ
e m
eta-
sum
mar
y as
a
tech
niqu
e us
eful
for
synt
hesi
zing
qua
lita-
tive
and
quan
titat
ive
descriptivefindings.
42 re
ports
(35
jour
nal a
rticl
es, s
ix u
npub
lishe
d th
eses
or d
isse
rtatio
ns a
nd o
ne te
chni
cal r
epor
t) w
ere
incl
uded
.
Of t
he 4
2 re
ports
, 12
wer
e qu
alita
tive
stud
ies,
thre
e w
ere
inte
rven
tion
stud
ies,
one
was
a m
ixed
m
etho
ds s
tudy
and
26
wer
e va
rietie
s of
qua
ntita
tive
obse
rvat
iona
l stu
dies
.
Feat
ures
of q
ualit
ativ
e m
eta-
sum
mar
ies
incl
ude:
a qu
antit
ativ
ely-
orie
nted
agg
rega
tion
appr
oach
to re
sear
ch s
ynth
esis
;•
theextraction,groupingandformattingoffindings,andthecalculationoffrequencyand
•in
tens
ity e
ffect
siz
es;
able
to s
ynth
esiz
e m
etho
ds/re
sults
of q
ualit
ativ
e an
d qu
antit
ativ
e su
rvey
s of
resp
onse
s •
obta
ined
from
sim
ilar d
ata
colle
ctio
n an
d an
alys
is p
roce
dure
s;
abletoconductaposteriorianalysesoftherelationshipbetweenreportsandfindings.
•
In c
ontra
st w
ith m
ost s
yste
mat
ic re
view
met
hods
, thi
s st
udy
is b
iase
d to
war
d in
clus
ion
rath
er th
an e
xclu
sion
of
stu
dies
.
51
Stud
yPu
rpos
eM
etho
ds/R
esul
tsC
omm
ents
/Issu
esJe
ffers
on &
Dem
i-ch
elli
1999
UK
STU
DIE
S
(exp
erim
enta
l vs.
no
n-ex
perim
enta
l)
To e
xam
ine
the
rela
-tio
n be
twee
n ex
peri-
men
tal a
nd n
on-e
xper
-im
enta
l stu
dy d
esig
ns
in v
acci
nolo
gy.
The
auth
ors
desc
ribed
exp
erim
enta
l and
non
-exp
erim
enta
l stu
dies
and
thei
r app
roac
hes.
They
ass
esse
d ea
ch s
tudy
des
ign’
s ca
pabi
lity
of te
stin
g fo
ur a
spec
ts o
f vac
cine
per
form
ance
, na
mel
y im
mun
ogen
icity
, dur
atio
n of
imm
unity
con
ferr
ed, i
ncid
ence
and
ser
ious
ness
of s
ide
ef-
fect
s an
d nu
mbe
r of i
nfec
tions
pre
vent
ed b
y va
ccin
atio
n.
Non
-exp
erim
enta
l des
igns
sho
uld
be a
pplie
d w
hen:
an e
xper
imen
t is
impo
ssib
le b
ecau
se th
e ai
m o
f •
eval
uatio
n is
to a
sses
s va
ccin
e ef
fect
iven
ess;
an e
xper
imen
t is
unne
cess
ary
(e.g
., S
mal
lpox
•
vacc
inat
ion
eval
uatio
n);
an e
xper
imen
t is
inap
prop
riate
(tria
l pop
ulat
ion
•no
t lar
ge e
noug
h to
det
ect e
vent
/out
com
e, re
luc-
tanc
e/re
fusa
l to
parti
cipa
te, e
thic
al/le
gal/p
oliti
cal
obst
acle
s);
individualefficacyismeasuredintermsofan
•in
frequ
ent a
dver
se e
vent
;
inte
rven
tions
pre
vent
rare
eve
nts;
•
the
popu
latio
n ef
fect
iven
ess
of th
e va
ccin
e is
to
•be
mea
sure
d in
term
s of
long
-term
, rar
e an
d se
ri-ou
s co
nseq
uenc
es o
f dis
ease
.
Rot
stei
n &
Lau
-pa
cis
2004
Can
ada
STU
DIE
S
(HTA
s vs
. SR
s)
To e
luci
date
the
impo
rtant
diff
eren
ces
betw
een
heal
th te
ch-
nolo
gy a
sses
smen
ts
(HTA
s) a
nd s
yste
mat
ic
revi
ews
(SR
).
The
rese
arch
ers
inte
rvie
wed
17
auth
ors
or u
sers
of H
TA.
TheyidentifiedsevenareasofdifferencesbetweenHTA
sandSRs:
1. m
etho
dolo
gica
l sta
ndar
ds (H
TAs
may
incl
ude
liter
atur
e of
rela
tivel
y po
or m
etho
dolo
gica
l qu
ality
if a
topi
c is
of i
mpo
rtanc
e to
dec
isio
n-m
aker
s)
2. re
plic
atio
n of
pre
viou
s st
udie
s (r
elat
ivel
y co
mm
on fo
r HTA
s bu
t not
sys
tem
atic
revi
ews)
3. c
hoic
e of
topi
cs (m
ore
polic
y-or
ient
ed fo
r HTA
s, w
hile
sys
tem
atic
revi
ews
tend
to b
e dr
iven
by
rese
arch
er in
tere
st)
4. in
clus
ion
of c
onte
nt e
xper
ts a
nd p
olic
y-m
aker
s as
aut
hors
(pol
icy-
mak
ers
mor
e lik
ely
to b
e includedinHTA
s,althoughtherearepotentialconflictsofinterest)
5. in
clus
ion
of e
cono
mic
eva
luat
ions
(mor
e of
ten
with
HTA
s, a
lthou
gh e
cono
mic
eva
luat
ions
ba
sed
upon
poo
r clin
ical
dat
a m
ay n
ot b
e us
eful
)
6. m
akin
g po
licy
reco
mm
enda
tions
(mor
e lik
ely
with
HTA
s, a
lthou
gh th
is m
ust b
e do
ne w
ith
caut
ion)
7. d
isse
min
atio
n of
the
repo
rt (m
ore
ofte
n ac
tivel
y do
ne fo
r HTA
s)
HTA
s ar
e no
t sol
ely
conc
erne
d w
ith th
e ev
alua
tion
of
scientificevidence.
52
53
Reference List
Scottish Intercollegiate Guidelines Network 2001–2005. (2001). SIGN 50: A guideline developers’ handbook. Retrieved February 1, 2009 from http://www.sign.ac.uk/guidelines/fulltext/50/in-dex.html.
Critical Appraisal Skills Programme (CASP) and evidence-based practice. (2005). Critical Appraisal Skills Programme: Making sense of evidence - CASP appraisal tools. Oxford, Public Health Resource Unit, Milton Keynes Primary Care NHS Trust. Retrieved February 1, 2009 from http://www.phru.nhs.uk/Pages/PHD/resources.htm.
Atkins, D., Briss, P. A., Eccles, M., Flottorp, S., Guyatt, G. H., Harbour, R. T., et al. (2005). Systems for grading the quality of evidence and the strength of recommendations II: Pilot study of a new system. BMC Health Services Research, 5, 25.
Atkins, D., & DiGuiseppi, C. G. (1998). Broadening the evidence base for evidence-based guidelines. A research agenda based on the work of the U.S. Preventive Services Task Force. American Journal of Preventive Medicine, 14, 335-344.
Benson, K., & Hartz, A. J. (2000). A comparison of observational studies and randomized, controlled trials. The New England Journal of Medicine, 342, 1878-1886.
Britton, A., McKee, M., Black, N., McPherson, K., Sanderson, C., & Bain, C. (1998). Choosing be-tween randomised and non-randomised studies: A systematic review. Health Technology Assessment, 2, 1-124.
Chou, R., & Helfand, M. (2005). Challenges in systematic reviews that assess treatment harms. An-nals of Internal Medicine, 142, 1090-1099.
Conn, V. S., & Rantz, M. J. (2003). Focus on research methods. Research methods: Managing pri-mary study quality in meta-analysis. Research in Nursing & Health, 26, 322-333.
Dalziel,K.,Round,A.,Stein,K.,Garside,R.,Castelnuovo,E.,&Payne,L.(2005).Dothefindingsofcaseseriesstudiesvarysignificantlyaccordingtomethodologicalcharacteristics?Health Technology Assessment, 9, iii-iv, 1.
Deeks, J. J., Dinnes, J., D’Amico, R., Sowden, A. J., Sakarovitch, C., Song, F., et al. (2003). Evaluat-ing non-randomised intervention studies. Health Technology Assessment, 7, 1-173.
DeWalt, D. A., Berkman, N. D., Sheridan, S., Lohr, K. N., & Pignone, M. P. (2004). Literacy and health outcomes: A systematic review of the literature. Journal of General Internal Medicine, 19, 1228-1239.
DiCenso,A.,Prevost,S.,Benefield,L.,Bingle,J.,Ciliska,D.,Driever,M.,etal.(2004).Evidence-based nursing: Rationale and resources. Worldviews on Evidence-Based Nursing, 1, 69-75.
Downs, S. H., & Black, N. (1998). The feasibility of creating a checklist for the assessment of the methodological quality both of randomised and non-randomised studies of health care inter-ventions. Journal of Epidemiology and Community Health, 52, 377-384.
54
Goldsmith, M. R., Bankhead, C. R., & Austoker, J. (2007). Synthesising quantitative and qualitative research in evidence-based patient information. Journal of Epidemiology and Community Health, 61, 262-270.
Greenhalgh, T., Robert, G., Macfarlane, F., Bate, P., Kyriakidou, O., & Peacock, R. (2005). Storylines of research in diffusion of innovation: A meta-narrative approach to systematic review. Social Science & Medicine, 61, 417-430.
Greer, N., Mosser, G., Logan, G., & Halaas, G. W. (2000). A practical approach to evidence grading. Joint Commission Journal on Quality Improvement, 26, 700-712.
Harden, A., Garcia, J., Oliver, S., Rees, R., Shepherd, J., Brunton, G., et al. (2004). Applying system-atic review methods to studies of people’s views: An example from public health research. Journal of Epidemiology and Community Health, 58, 794-800.
Jackson, N., & Waters, E. (2005). Criteria for the systematic review of health promotion and public health interventions. Health Promotion International, 20, 367-374.
Jefferson, T., & Demicheli, V. (1999). Relation between experimental and non-experimental study de-signs. HB vaccines: A case study. Journal of Epidemiology and Community Health, 53, 51-54.
Katrak, P., Bialocerkowski, A. E., Massy-Westropp, N., Kumar, S., & Grimmer, K. A. (2004). A sys-tematic review of the content of critical appraisal tools. BMC Medical Research Methodology, 4:22.
Khan, K., ter Riet, G., Popay, J., Nixon, J., & Kleijnen, J. (2001). Stage II: Conducting the review. Phase 5: Study quality assessment. In: K. S. Khan, G. ter Riet, J. Glanville, et al. Undertak-ing systematic reviews of research on effectiveness. CRD’s guidance for carrying out or com-missioning reviews 2nd Edition (pp. 1-20). York: NHS Centre for Reviews and Dissemination, University of York.
Lemmer, B., Grellier, R., & Steven, J. (1999). Systematic review of non-random and qualitative re-search literature: Exploring and uncovering an evidence base for health visiting and decision making. Qualitative Health Research, 9, 315-328.
Lethaby, A., Wells, S., & Furness, S. (2001). Handbook for the preparation of explicit evidence-based clinical practice guidelines. Aukland, New Zealand: New Zealand Guidelines Group, Effective Practice Institute of the University of Auckland.
Linde, K., Scholz, M., Melchart, D., & Willich, S. N. (2002). Should systematic reviews include non-randomizedanduncontrolledstudies?Thecaseofacupunctureforchronicheadache.Jour-nal of Clinical Epidemiology, 55, 77-85.
MacLehose, R. R., Reeves, B. C., Harvey, I. M., Sheldon, T. A., Russell, I. T., & Black, A. M. (2000). A systematic review of comparisons of effect sizes derived from randomised and non-ran-domised studies. Health Technology Assessment, 4, 1-154.
Margetts, B. M., Thompson, R. L., Key, T., Duffy, S., Nelson, M., Bingham, S., et al. (1995). Develop-mentofascoringsystemtojudgethescientificqualityofinformationfromcase-controlandcohort studies of nutrition and disease. Nutrition and Cancer, 24, 231-239.
55
McIntosh, H. M., Woolacott, N. F., & Bagnall, A. M. (2004). Assessing harmful effects in systematic reviews. BMC Medical Research Methodology, 4, 19.
Norris, S., & Atkins, D. (2005). Challenges in using nonrandomized studies in systematic reviews of treatment interventions. Annals of Internal Medicine, 142, 1112-1119.
Ogilvie, D., Egan, M., Hamilton, V., & Petticrew, M. (2005). Systematic reviews of health effects of socialinterventions:2.Bestavailableevidence:Howlowshouldyougo?Journal of Epidemi-ology and Community Health, 59, 886-892.
Oliver, S., Harden, A., Rees, R., Shepherd, J., Brunton, G., Garcia, J., et al. (2005). An emerging framework for including different types of evidence in systematic reviews for public policy. Evaluation, 11, 428-446.
Ramsay, C. R., Matowe, L., Grilli, R., Grimshaw, J. M., & Thomas, R. E. (2003). Interrupted time series designs in health technology assessment: Lessons from two systematic reviews of behavior change strategies. International Journal of Technology Assessment in Health Care, 19, 613-623.
Rangel, S. J., Kelsey, J., Colby, C. E., Anderson, J., & Moss, R. L. (2003). Development of a quality assessment scale for retrospective clinical studies in pediatric surgery. Journal of Pediatric Surgery, 38, 390-396.
Rotstein, D., & Laupacis, A. (2004). Differences between systematic reviews and health technology assessments:Atrade-offbetweentheidealsofscientificrigorandtherealitiesofpolicymak-ing. International Journal of Technology Assessment in Health Care, 20, 177-183.
Sandelowski, M., Barroso, J., & Voils, C. I. (2007). Using qualitative metasummary to synthesize qualitativeandquantitativedescriptivefindings.Research in Nursing & Health, 30, 99-111.
Slim, K., Nini, E., Forestier, D., Kwiatkowski, F., Panis, Y., & Chipponi, J. (2003). Methodological index for non-randomized studies (minors): Development and validation of a new instrument. ANZ Journal of Surgery, 73, 712-716.
Spencer, L., Ritchie, J., & Lewis, J. (2004). Quality in qualitative evidence: A framework for assessing research evidence.(2nded.)London:GovernmentChiefSocialResearcher’sOffice.
Stein, K., Dalziel, K., Garside, R., Castelnuovo, E., & Round, A. (2005). Association between method-ological characteristics and outcome in health technology assessments which included case series. International Journal of Technology Assessment in Health Care, 21, 277-287.
Steuten, L. M., Vrijhoef, H. J., van-Merode, G. G., Severens, J. L., & Spreeuwenberg, C. (2004). The health technology assessment – Disease management instrument reliably measured method-ologic quality of health technology assessments of disease management. Journal of Clinical Epidemiology, 57, 881-888.
Thomas, B. H., Ciliska, D., Dobbins, M., & Micucci, S. (2004a). A process for systematically reviewing the literature: Providing the research evidence for public health nursing interventions. World-views on Evidence-Based Nursing, 1, 176-184.
56
Thomas, J., Harden, A., Oakley, A., Oliver, S., Sutcliffe, K., Rees, R., et al. (2004b). Integrating quali-tative research with trials in systematic reviews. British Medical Journal, 328, 1010-1012.
Thomson, H., Atkinson, R., Petticrew, M., & Kearns, A. (2006). Do urban regeneration programmes improvepublichealthandreducehealthinequalities?AsynthesisoftheevidencefromUKpolicy and practice (1980–2004). Journal of Epidemiology and Community Health, 60, 108-115.
West, S., King, V., Carey, T. S., Lohr, K. N., McKoy, N., Sutton, S. F., et al. (2002). Systems to rate the strengthofscientificevidence.Evidence Report – Technology Assessment (Summary), 1-11.
Whittemore,R.,&Knafl,K.(2005).Theintegrativereview:Updatedmethodology.Journal of Ad-vanced Nursing, 52, 546-553.
Wong, O., & Raabe, G. K. (1996). Application of meta-analysis in reviewing occupational cohort stud-ies. Occupational and Environmental Medicine, 53, 793-800.
Zaza, S., Wright-De Aguero, L. K., Briss, P. A., Truman, B. I., Hopkins, D. P., Hennessy, M. H., et al. (2000). Data collection instrument and procedure for systematic reviews in the Guide to Com-munity Preventive Services. American Journal of Preventive Medicine, 18, 44-74.