56
June, 2009 The Methods for the Synthesis of Studies without Control Groups Donna Fitzpatrick-Lewis • Donna Ciliska • Helen Thomas

The Methods for the Synthesis of Studies without … The Methods for the Synthesis of Studies without Control Groups Prepared for the National Collaborating Centre for Methods and

Embed Size (px)

Citation preview

June, 2009

The Methods for the Synthesis of Studies without Control Groups

Donna Fitzpatrick-Lewis • Donna Ciliska • Helen Thomas

2

3

The Methods for the Synthesis of Studies without Control Groups

Prepared for the National Collaborating Centre for Methods and Tools by

Donna Fitzpatrick-Lewis • Donna Ciliska • Helen Thomas

June, 2009

National Collaborating Centre for Methods and Tools (NCCMT)

School of Nursing, McMaster University

Suite 302, 1685 Main Street West

Hamilton, Ontario L8S 1G5

Telephone: (905) 525-9140, ext. 20455

Fax: (905 529-4184

Funded by the Public Health Agency of CanadaAffiliated with McMaster UniversityProduction of this paper has been made possible through a financial contribution from the Public Health Agency of Canada. The views expressed herein do not necessarily represent the views of the Public Health Agency of Canada.

mackint
Text Box
How to cite this resource: Fitzpatrick-Lewis, D., Ciliska, D., & Thomas, H. (2009). The Methods for the Synthesis of Studies without Control Groups. Hamilton, ON: National Collaborating Centre for Methods and Tools. [http://www.nccmt.ca/pubs/non-RCT2_EN.pdf]

4

5

Contents

Key Recommendations: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10Methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10Relevance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11Data Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12Critical Appraisal Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12Systematic Reviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17Methods for Synthesizing Quantitative Studies without Control Groups . . . . . . 23Combining Qualitative and Quantitative Studies . . . . . . . . . . . . . . . . . . . . . . . 26Studies Comparing Results Using Differing Methods . . . . . . . . . . . . . . . . . . . . 27

Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

Tables summarizing results of relevant papers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

6

Key Recommendations:1. Incorporate studies without control groups into systematic reviews, especially when

there are no other studies to consider. Studies without control groups can also pro-vide information on long-term effectiveness, rare events and adverse effects.

2. Do not generalize the direction of differences in outcome between randomized and non-controlled studies. While effect sizes are more often smaller in randomized controlled trials (RCTs), some show the reverse or the same results across study designs.

3. Use Whittemore and Knafl’s (2005) approach to include results of non-randomized trials. Whittemore and Knafl provide a five-step integrative review method that incorporates problem identification, literature search, data evaluation, data analysis and presentation of data.

4. Recommendations for methods of systematic review including studies without con-trol groups:

a) Use a critical appraisal tool that has been tested for internal consistency, test-retest and inter-rater reliability, and at least face and criterion validity (Downs & Black, 1998; Slim, Nini, Forestier et al., 2003; Steuten, Vrijhoef, van-Merode et al., 2004; Thomas, Ciliska, Dobbins et al., 2004a; Zaza, Wright-De Aguero, Briss et al., 2000).

b) Do not meta-analyse results from observational studies.5. Suggestions to authors of primary studies:

a) Ensure a strong and transparent study design in all non-randomized trials.b) Confirm that the research question determines the study design.c) Incorporate detailed information about the study; lack of information means

that readers cannot determine what has or has not been done in the study design.

d) Ensure consistency in terminology and descriptions across non-random-ized studies to facilitate comparisons.

6. Implications for further research:a) Conduct reliability and validity testing of any critical appraisal tools. b) Conduct further studies to assess the different results obtained by random-

ized versus non-randomized trials. 7. Including data from non-randomized trials is challenging; however, several benefits

can be achieved:a) A wider body of literature can be included in search strategies (Thomas,

Harden, Oakley et al., 2004).b) The scope of the questions that can be answered may be broadened. c) This research can shed light on why, how and for whom interventions or

programs succeed or fail (Oliver, Harden, Rees et al., 2005).

7

Summary

There is currently no standardized model for synthesizing results of studies that do not have control groups. The purpose of this paper is to examine synthesis methods, critical appraisal tools and studies that deal with appraising and synthesizing quantitative studies without control groups. RCTs are achievable in clinical settings, but public health interventions can rarely replicate the controlled environment of the clinic. Researchers, policy-makers and practice decision-makers often must rely on other types of study designs for their source of evidence. This paper will outline precautions to be aware of when including non-controlled studies as evidence, and recommends effective tools to use to analyse the results from these types of studies.

Methods

With the assistance of a librarian, five electronic databases were searched using keywords. In addition, reference lists of all relevant articles and grey literature were searched.

An article met the relevance criteria if:it was a primary study comparing results of the same intervention using randomized • controlled trials (RCTs) versus designs without control groups;it was a synthesis of methods used to review studies without control groups;• at least one of these designs was included: one group cohort (before/after or just • after), cross-sectional, interrupted time series or mixed methods; it was published between 1995 and November 2007.•

A paper was excluded if:the focus was RCT, clinical controlled trials or quasi-randomized studies;• the focus was a cohort analytic study (before/after on two or more groups);• the study was qualitative only;• it was published before 1995.•

Thirty-six papers passed the relevance test and full data extraction was completed on those papers. Many of the included studies were observational studies. It is important to note that while there are many definitions of observational studies, only those without control groups were included in this paper.

Findings

Fourteen articles provided critical appraisal tools that could assist in judging the method-ological rigour of primary studies. Primary studies for a systematic review should include a quality assessment of the studies. A critical appraisal tool is essential when sifting through evidence in primary studies without control groups. These tools can provide information about the strengths and weaknesses of information. Critical appraisal tools assist the reader in assessing the methodological quality, managing confounders, determining potential bi-ases and performing data analysis. These tools are particularly helpful where the criteria for

8

assessing RCTs are not applicable. Many of the tools have not been tested for validity and reliability; this testing might be a sensible next step.

Eleven systematic reviews incorporated primary studies without control groups, or examined critical appraisal tools for quality assessment of studies without control groups. The reviews that examined the feasibility of combining the results of RCTs and non-RCTs suggest that well-designed studies can produce similar results, regardless of the study type. Many of these reviews suggest that poor execution of the primary study design result in lower quality scoring, rather than the study design itself. They cited under-reporting of methodology, poor management of confounders and/or lack of consistent analytical terminology as problematic when trying to compare data between studies.

Six papers examined various methods of synthesizing and integrating quantitative data from studies without control groups. The authors agreed that there is benefit to incorporating evi-dence from diverse study designs, but quality assessment is paramount for providing trust-worthy evidence. Primary studies that are clear about potential bias and probable impact on effect can be usefully incorporated into systematic reviews.

Three articles described methods of synthesizing data from both qualitative and quantita-tive studies. Incorporating qualitative and quantitative data in one report can help build an understanding of the success or failure of interventions. Assessing the quality of studies without control groups can be problematic, especially if quality is assessed with traditional criteria used in systematic reviews or meta-analyses. Results of quality assessment were not used as exclusion criteria, but could be incorporated in the discussion of findings. Group-ing findings according to a thematic and aggregate method was offered as an appropriate and alternative approach to meta-analysis. Researchers could also incorporate some of the processes used in systematic reviews, such as having two people review and perform data extraction. Most of these authors suggested a narrative approach for presenting results when meta-analysis was not appropriate.

Two papers compared the results of different methodologies applied to the same interven-tion. Those articles pointed out that it is useful to consider different designs in reviews. Not only may they be the best available information, but they also can be more able to answer questions related to long-term effectiveness, adverse events and rare events. The strengths of including different designs in reviews outweigh the limitations.

Recommendations

1. Incorporate studies without control groups into systematic reviews, especially when there are no other studies to consider. Studies without control groups can also pro-vide information on long-term effectiveness, rare events and adverse effects.

2. Do not generalize the direction of differences in outcome between randomized and non-controlled studies. While effect sizes are more often smaller in randomized controlled trials (RCTs), some show the reverse or the same results across study designs.

3. Use Whittemore and Knafl’s (2005) approach to include results of non-randomized trials. Whittemore and Knafl provide a five-step integrative review method that

9

incorporates problem identification, literature search, data evaluation, data analysis and presentation of data.

4. Recommendations for methods of systematic review including studies without con-trol groups:

a) Use a critical appraisal tool that has been tested for internal consistency, test-retest and inter-rater reliability, and at least face and criterion validity (Downs & Black, 1998; Slim, Nini, Forestier et al., 2003; Steuten, Vrijhoef, van-Merode et al., 2004; Thomas, Ciliska, Dobbins et al., 2004a; Zaza, Wright-De Aguero, Briss et al., 2000).

b) Do not meta-analyse results from observational studies.5. Suggestions to authors of primary studies:

a) Ensure a strong and transparent study design in all non-randomized trials.b) Confirm that the research question determines the study design.c) Incorporate detailed information about the study; lack of information means

that readers cannot determine what has or has not been done in the study design.

d) Ensure consistency in terminology and descriptions across non-random-ized studies to facilitate comparisons.

6. Implications for further research:a) Conduct reliability and validity testing of any critical appraisal tools. b) Conduct further studies to assess the different results obtained by random-

ized versus non-randomized trials. 7. Including data from non-randomized trials is challenging; however, several benefits

can be achieved:a) A wider body of literature can be included in search strategies (Thomas,

Harden, Oakley et al., 2004).b) The scope of the questions that can be answered may be broadened. c) This research can shed light on why, how and for whom interventions or

programs succeed or fail (Oliver, Harden, Rees et al., 2005).

10

Introduction

The overriding goal of public health policy and practice is to protect people’s health. To achieve this goal, public health professionals consider evidence when making decisions. Historically, scientists have preferred the randomized controlled trial (RCT) study design to provide the least biased results when analysing the effectiveness of interventions. An RCT may provide rigorous study results for effectiveness questions; however, it might not be the best study design for other types of research questions (DiCenso, Prevost, Benefield et al., 2004). In public health and health promotion, it may be difficult to use randomized trials for a particular question; does that mean we should ignore evidence from other methods? Some researchers believe that only RCTs have the scientific rigour to produce valuable evidence.

In clinical settings, RCTs are achievable. But public health interventions can rarely replicate the controlled environment of the clinic. These interventions are often community-based and recruitment can be challenging, creating a selection bias. As a result, maintaining pure control groups without cross-contamination may be impossible or impractical. Public health researchers must often rely on other types of study designs, often classified as lower on the “hierarchy of evidence” (Ogilvie, Egan, Hamilton et al., 2005). However, observational stud-ies, for example, can provide valuable information for public health decision-making. Qualita-tive information can supplement statistical data by helping us understand the ‘who, why and how’ of intervention success or failure. There is a whole body of literature about the methods for synthesizing qualitative studies (meta-synthesis), but these methods are beyond the scope of this paper.

Due to the greater potential for bias in observational studies, they are often excluded from systematic reviews of treatments. However, they offer several advantages to RCTs, includ-ing: being less costly; allowing for larger sample sizes; and providing more long-term out-come measurements (Benson & Hartz, 2000). As well, observational studies may provide the “best available evidence” in certain situations (Ogilvie et al., 2005).

There is currently no standardized model for synthesizing the results of studies that do not have control groups. The purpose of this paper is to examine synthesis methods, criti-cal appraisal tools and studies that deal with appraising and synthesizing the results from quantitative studies that lack control groups. This paper will also outline some precautions to take when using non-controlled studies as evidence, and will make recommendations for processes and tools.

Methods

The methods used to collect the literature for this paper included:a comprehensive literature search of published literature from January 1995 to • October 2007;review and retrieval of references from relevant articles;• a search and retrieval of potentially relevant grey literature.•

The initial search was conducted by a skilled librarian, using the following key search words:

11

review, systematic review, literature review, meta-analysis, not random and not randomized clinical trials. These key word searches were conducted in a several databases: PsycINFO, OVID MEDLINE, EMBASE, CINAHL and Scholars Portal. The initial search located 9324 potentially relevant articles. Titles and abstracts were read independently by two reviewers who passed on 182 articles for full relevance testing.

Relevance

An article was included for full relevance testing if:it was a primary study comparing results of the same intervention using randomized • trials versus designs without control groups;it was a synthesis of methods used to review studies without control groups;• at least one of these designs was included: one group cohort (before/after or just • after), cross-sectional, interrupted time series or mixed methods; it was published between 1995 and November 2007.•

A paper was excluded if:the focus was RCT, clinical controlled trials or quasi-randomized studies;• the focus was a cohort analytic study (before/after on two or more groups);• the study was qualitative only;• it was published before 1995. •

Thirty-six papers passed the relevance test, and researchers completed full data extrac-tion on those papers. Many of the final papers were observational studies. There are many definitions for observational studies; for this paper, only observational studies with no control groups were considered relevant.

Data Synthesis

Researchers synthesized 36 papers in a narrative format by topic:critical appraisal tools• systematic reviews• methods for synthesizing quantitative studies without control groups• combined qualitative and quantitative studies• comparing results using differing methods•

Some papers could have been included in more than one section; however, we placed the papers in the groups that appeared most suitable.

The paper provides a narrative synopsis of the relevant papers. The results are also detailed in table format (See Appendix 1).

12

Findings

Critical Appraisal Tools

A total of 14 papers described critical appraisal tools.

Rangel, Kelsey, Colby et al., (2003) addressed the lack of a standardized measurement tool to determine the quality of observational studies. They developed a quality assessment in-strument for non-randomized studies that incorporated 30 items within three subscales. The subscales assessed clinical relevance, reporting methodology and the strength of conclu-sions. Global ratings were determined by combining each subscale, and the studies were thereby rated as ‘good,’ ‘fair’ or ‘poor.’ They achieved high inter-rater reliability with levels of agreement reaching 84.6% concordance of results (n = 1573 items). For all individual subscales, there was range of 73.3% to 85.8% (n = 60 to 1258). The potential applications for this tool include:

facilitating the development of standardized methodology for conducting systematic • reviews of retrospective data; helping to develop a strategy for meta-analysis of non-randomized trials; • developing standardized reporting guidelines for use in peer-reviewed journals.•

Margetts et al. (1995) developed a scoring system to judge the scientific quality of obser-vational epidemiological studies linking diet with the risk of cancer. Case-control and cohort studies were reviewed separately because some of the reliability markers applied to only one of these study types. Case-control studies were rated on three broad areas: quality of dietary assessment; recruitment of subject; and the analysis of results. Cohort studies were scored on four areas: dietary assessment, definition of cohort, ascertainment of disease cases and analysis of results. Inter-rater reliability was assessed and resulted in a high level of agreement between reviewers, with the cohort reviewers attaining a slightly higher level of agreement. This prototype scoring system helped the authors to describe the epidemiologi-cal data on diet and cancer; however it may not be generalizable to other topics.

Downs and Black (1998) developed and tested a checklist to assess randomized and non-randomized studies. They used epidemiological principles, reviews of study designs and existing checklists to develop their checklist, consisting of 26 items distributed between five subscales:

reporting• external validity• bias• confounding• power•

The maximum score was 31. Twenty-three of the questions could be asked of any study of any health care intervention. The other three questions were topic sensitive and were cus-tomized to provide the raters with information on confounders, main outcomes and the sam-

13

ple size required for clinical and statistical significance (p<0.05). The quality index had high internal consistency, good test-retest and inter-rater reliability and good face and criterion validity. It performed as well as other established checklists on randomized trials, and there was little difference between its performance with non-randomized and randomized studies.

To address the lack of validated instruments to measure quality of observational or non-ran-domized trials, Slim et al. (2003) tested a methodological index for non-randomized studies (MINORS). This tool includes 12 items (the first eight items were designed specifically for non-comparison trials):

a clearly stated aim• inclusion of consecutive patients• prospective collection of data• endpoints appropriate to the aim of the study• unbiased assessment of the study endpoints• a follow-up period appropriate to the aim of the study• loss to follow-up less than 5%• prospective calculation of the study size• an adequate control group• contemporary groups• baseline equivalence of groups• adequate statistical analysis•

This tool had good reliability, internal consistency and validity.

Zaza et al. (2000) developed a format to classify and provide descriptive components of public health inventions and assess the quality of the study’s execution. The authors ac-knowledged that all study designs have issues particular to their specific design, therefore the quality questions were developed to reflect those specific issues. For example, the authors suggested that while blinding is an important component to consider for randomized trials, it is not appropriate to assess validity in a time series design. Zaza et al. provided cat-egories of questions to assess potential threats to the validity of results in primary studies, including:

description of the invention• sampling• reliability and validity of outcome measurements• the appropriateness of the statistical analysis• the interpretation of results•

One aim of their approach was to design a tool that was flexible enough to be useful in the assessment of divergent study designs and interventions. The standardized abstraction form and procedure improved the validity and reliability of the Guide to Community Prevention Services: Systematic Reviews and Evidence-Based Recommendations.

14

Greer et al. (2000) presented and discussed a system of grading evidence that was devel-oped by the Institute for Clinical Systems Improvement (ICSI). This approach was developed in response to feedback indicating that an existing system was not working. Its goal is to as-sist busy practitioners who need to judge the quality of evidence. After reviewing many other grading systems, the ICSI working group agreed that it was important to separate the evalu-ation of the individual research reports from the assessment of the totality of the evidence supporting the conclusion. The subsequent worksheet for grading evidence—the primary tool—included a statement of conclusion, a summary of the research reports that support or dispute the conclusion, assignment of classes and quality markers to the research reports and assignment of a grade to the conclusion. Primary reports of new data collection were assigned a letter grade of A to D, depending on the type of study: randomized controlled trial; cohort study; non-randomized trial with concurrent or historic controls; case-control study; study of sensitivity and specificity of a diagnostic test; population-based descriptive study; cross-sectional study; case series; or case report. The report also included a five-point quality system to rate both positive and negative study design attributes, with ratings of plus, negative or neutral. The questions included:

a priori inclusion/exclusion criteria• bias• statistical or clinically significant results• generalizability of results to other populations• clearly outlined study design•

This system has been applied to more than 40 ICSI guidelines and technology assessment reports. The authors reported that the system appears to be successful in reducing the complexity of other grading systems, while yielding defendable classification of conclusions based on the strength of the underlying evidence.

Steuten et al. (2004) critiqued the tools used to assess the methodological quality of the Health Technology Assessment (HTA) of disease management. The authors developed an inventory of problems that arise when assessing HTA studies, including: study design (RCT vs. observational); criteria for selection and restriction of patients; baseline and outcome measures; blinding of patients and providers; the description of complex, multifaceted inter-ventions; and avoidance of co-interventions. They developed, proposed and validated a new instrument for assessing the methodological quality of HTA for disease management that includes four components:

study population• description of intervention• measurement of outcomes• data analysis/presentation of data•

The instrument includes 15 items that cover internal validity, external validity and statistical considerations. This instrument is reliable in terms of hierarchical ranking, test-retest reliabil-ity and inter-rater reliability when applied to HTAs of disease management.

Two reports (Chou & Helfand, 2005; McIntosh, Woolacott, & Bagnall, 2004) suggested

15

that evidence of harm is as important as evidence of effect. However, standard systematic reviews focusing on effectiveness and randomized trials may not be an efficient model for evaluating harm. Literature related to harm is found in unpublished trials, observational stud-ies and grey literature. Applying existing quality checklists to this data set was problematic and did not readily capture the types of information pertinent to harmful effects. McIntosh et al. adapted published checklists to reflect information about harmful effects found in obser-vational cohort studies. This new checklist also incorporated items such as how and when events were reported, and whether the time at which they occurred during the study was recorded. However, the greatest barrier to applying any checklist was the lack of method-ological detail provided by the primary study authors. The researchers also included narra-tive outcome reports. Chou and Helfand (2005) examined case reports and observational studies of harmful effects of treatment. They too found quality assessment difficult, as instru-ments for assessing these types of studies were rare or inconsistent. Developmental rigour, scope, and the number and type of items used were all contentious issues. They pilot-tested a quality assessment tool for RCTs, clinical control trials (CCTs) and cohort studies for an uncontrolled surgical series that reported complications from carotid endarterectomy. This eight-point tool provided a ranking of ‘good,’ ‘fair’ or ‘poor.’ The tool was tested on studies for one intervention. The authors cautioned researchers to avoid inappropriate pooling of statis-tical results from observational studies, given the potential for bias due to inadequate control of confounders and selection bias.

The GRADE Working Group (Atkins, Briss, Eccles et al., 2005) reported on the development and pilot testing of the GRADE approach to assessing evidence and recommendations. Twelve evidence profiles were independently graded by 17 judges who all had experience using other approaches to assessing evidence and recommendations. Each evidence profile was based on information available in a systematic review and included two tables, one for quality assessment and one for the summary of the findings. The quality assessment process was designed so that each outcome was evaluated separately. The outcome table displayed the number of studies that had reported that outcome, the study design (RCTs or observational) and the quality of the studies. Assessment of the quality of outcome revealed that factors other than study design, quality, consistency and directness affected the review-er’s judgment about quality. These other factors included: sparse data, strong associations, publication bias, dose response and situations where all plausible confounders strengthened rather than weakened confidence in the direction of effect. The judges were asked about the ease of understanding and the sensibility of the approach; they generally agreed that the GRADE approach was clear, understandable and sensible.

Khan et al. (2001) provided information on what needs to be included in any quality assess-ment checklist. In effectiveness trials, RCTs should be considered first when they are avail-able and well-designed. In the absence of good RCTs, well-designed quasi-experimental and observational studies should be considered. Khan et al. provided several questions to consider when assessing the quality of observational studies. They cautioned that although conclusions can be drawn from these studies, it may be unclear whether the groups within studies are comparable. The authors suggested that results need to be reported with cau-tion, and often with recommendations for further research.

Thomas, Ciliska, Dobbins and Micucci (2004) used a quality assessment instrument that

16

allowed for the methodological rating of studies that were not randomized control trials, for inclusion in a systematic review. This method was specifically developed for quality assess-ment of public health research. Quality assessment components included:

selection bias• study design• confounders• blinding• data collection methods• withdrawals or dropouts•

Studies rated with this system achieved a ‘strong,’ ‘moderate’ or ‘weak’ methodological rat-ing. The instrument had good inter-rater and test-retest reliability. It also had proven content and criterion validity, randomized control trials and clinical control trials were rated as strong and non-RCTs such as cohorts and observational studies were rated as moderate. However, non-randomized studies could achieve an overall study rating of ‘moderate’ if the other com-ponents of the study were strong. Data is extracted from studies and reported in a narrative format. Meta-analysis was rarely done as it was found that public health study populations and interventions are often too heterogeneous for this type of analysis. Data extraction al-lowed for reporting statistically significant (or lack of) effects from each study, and the re-searchers commented on the clinical meaningfulness of any statistically significant effect.

Ramsay et al. (2003) reviewed the quality of Interrupted Time Series (ITS) using studies included in two systematic reviews. They developed a quality assessment tool that incorpo-rated the following criteria:

the intervention occurred independently of other changes over time;• the intervention was unlikely to affect data collection; • the outcome was assessed blindly or measured objectively;• the outcome was reliable or measured objectively;• the composition of the data set at each time point covered at least 80% of all par-• ticipants in the study;the shape of the intervention effect was pre-specified;• the rationale for the number and spacing of data points was described;• the study was analysed appropriately using time series techniques. •

They found that 37 of the 58 studies were not analysed appropriately, and that 33 of those should be re-analysed. Poor study designs, insufficient power and inappropriate analysis of ITS studies resulted in misleading conclusions being presented in the published literature. The authors recommend that all data from ITS studies be re-analysed before inclusion in reviews.

Conn and Rantz (2003) explored ways in which researchers can manage quality when considering primary studies within a meta-analysis. While there are more than 100 scales available to measure the quality of these studies, they vary in size, composition, complexity and extent of development. Establishing the validity of the scales is complicated and chal-lenging. Scales are not reliable and accurate for all areas of science, and application of the

17

scales can be problematic and inconsistent. Conn and Rantz outlined three methods that could potentially manage quality:

setting a quality threshold• weighting by quality scoring• considering quality as an empirical question•

However, as a stand-alone method, each has limitations. For instance, setting a quality threshold may result in the exclusion of less rigorous research—data that may contain im-portant information. Weighting by study quality can be limited by the scaling tools used and some potentially inherent problems such as inter-rater reliability. Finally, considering quality as an empirical question may lead to overall measures masking interesting effects of indi-vidual components of quality on effect-size estimates. To accommodate for these limitations, researchers may need to combine strategies to include studies that are rated as method-ologically weak but still contain important information.

Section Summary

Reviewing literature to be included in a systematic review should include a quality assess-ment. A critical appraisal tool is essential when sifting through evidence in primary studies without control groups. These tools can provide information regarding the strengths and weaknesses of information. Critical appraisal tools assist the reader in assessing the evi-dence provided based on methodological quality, management of confounders, potential biases and data analysis. These tools are particularly helpful where systems of assessment used for RCTs are not applicable. Many of the tools presented here have not been tested for validity and reliability. This is a reasonable next step.

Systematic Reviews

Eleven systematic reviews considered the question of synthesizing data from studies without control groups.

Linde, Scholz, Melchart and Willich (2002) examined whether systematic reviews should include both randomized and non-randomized studies. The researchers examined the data on the use of acupuncture for chronic headaches to explore the following questions:

1. Do randomized and non-randomized studies of acupuncture for chronic headaches differ in regard to patients, interventions, design-independent quality aspects and response rates?

2. Do non-randomized studies provide relevant additional information (results of long-term outcomes, prognostic factors, adverse effects or complications and response rates in representative and well-defined groups of patients)?

3. If response rates in randomized and non-randomized patients differ, what are the possible explanations?

Fifty-nine studies met the inclusion criteria, of which 24 were randomized trials and 35 were non-randomized trials (five non-randomized control trials, 10 prospective uncontrolled stud-

18

ies, 10 case series and 10 cross-sectional surveys). A total of 535 patients received acu-puncture treatment in the 24 RCTs, compared with 2695 in the 35 non-randomized studies. On average, randomized trials had smaller sample sizes, met more quality criteria and had lower response rates to treatment (0.59; 0.40–0.69) vs. (0.78; 0.72–0.83). Regardless of randomization status, studies meeting more quality criteria had lower response to treatment rates. Follow-up time was not significantly greater in the non-randomized studies. In the case of acupuncture for chronic headaches, non-randomized studies confirmed the findings in a previous study (Greenhalgh, Robert, Macfarlane et al., 2005) that the treatment was likely to be effective. However, non-randomized studies provided little additional relevant in-formation on long-term effects, prognostic factors or adverse effects. In general, the authors concluded that non-randomized studies of good quality yielded results similar to RCTs. This suggests that their inclusion in systematic reviews, while increasing the workload for authors of these reviews, may provide a useful extension for generalizability.

MacLehose et al. (2000) investigated the association between methodological quality and estimates of effectiveness by comparing RCTs and quasi-experimental and observational (QEO) studies. The authors employed two strategies when reviewing the literature:

1. comparing RCT and QEO study estimates of effectiveness of any intervention, where both estimates were reported in one paper;

2. comparing the RCT and QEO estimates of effectiveness for specified interventions, where the estimates were reported in different papers.

For strategy 1, the authors identified 14 papers containing 38 comparisons, of which 13 were classified as high quality and 25 were classified as low. Quality assessment criteria included:

study estimates derived from the same population;• blinding of outcome assessors;• the extent to which the QEO study estimate took possible confounding into ac-• count.

The findings indicated that the discrepancies between RCT and QEO study estimates of effect size and outcome frequency for intervention and control groups were smaller for high-quality than for low-quality comparisons. In addition, there was no tendency for QEO study estimates of effect size to be more extreme for high-quality comparisons than low-quality RCTs. For strategy 2, the specific interventions included mammography screening to re-duce breast cancer mortality and folic acid supplements to reduce neutral tube defects. The authors identified 34 papers, of which 12 papers were RCTs, 11 were non-randomized trials or cohort studies and nine were matched or unmatched case-control studies. These studies were assessed based on

the quality of the reporting;• the generalizability of the results;• the extent to which estimates of effectiveness may have been subject to bias or • confounding.

Cohort and case-control studies had lower quality scores than RCTs, and cohort studies had

19

lower quality scores than case-control studies. Meta-regression of study attributes against relative risk estimates showed no association between effect size and study quality. Esti-mates for RCTs and cohort studies were not significantly different; however, case-control studies gave significantly different estimates in the mammography studies (greater benefit) and the folic acid studies (less benefit).

Britton et al. (1998) explored issues related to the process of randomization that may af-fect the validity of conclusions drawn from the results of RCTs and non-randomized studies (including studies without control groups). The authors asked four research questions to examine the issues of randomization and validity:

1. Do non-randomized studies differ systemically from RCTs in terms of treatment ef-fect?

2. Are there systematic differences between the included and excluded individuals, and do these influence the measured treatment effect?

3. To what extent is it possible to adjust for baseline differences between study groups?

4. How important is patient preference in terms of outcomes?

Their study included 18 papers that directly compared the results from RCTs and non-ran-domized studies. The results found no consistent reporting of larger or smaller estimates of treatment effect based on study design. The authors also reported that the intervention type did not seem to be influential; however, they cautioned that more study is needed to vali-date that inference. In RCTs, blanket exclusions were common, and the number of included eligible subjects ranged from 1% to 100%. Large clinical databases containing detailed information of patient severity and prognosis were used instead of RCTs. Where the da-tabase subjects were selected according to the same inclusion criteria as RCTs, the treat-ment effects of the two designs were similar. Documentation of the characteristics of eligible individuals who did not participate in the trials was poorly recorded in most RCTs. Treatment effect measures in RCTs may be exaggerated due to the larger participation of university and teaching centres as compared to non-randomized studies. In non-randomized stud-ies, adjustment for differences often changed the treatment effect size, but not significantly; more importantly, the direction of the change was consistent. Only four papers addressed the role of patient preference on results. In those papers, preference accounted for some of the observed differences between study designs. Britton et al. concluded:

a well-designed non-randomized study is preferable to a small, poorly designed • and exclusive RCT;RCTs should include a wide range of practice settings; • study populations should be representative of all patients receiving the intervention; • exclusions for administrative convenience should be rejected; • differences should be minimized by ensuring that subjects in both kinds of study • are comparable.

Lemmer, Grellier and Stevens (1999) modified the Cochrane Collaboration Protocol for Sys-tematic Reviews for a report that explored the evidence for decision-making and health visits (home health care provision) in Britain. Health visiting and decision-making are influenced

20

by factors such as social and environmental forces, which tend not to be captured in the evidence emerging from randomized control trials. The adapted Cochrane Protocol omitted sections that were felt to be inappropriate for non-RCT research papers. The methodologi-cal rigour and relevance of each article was determined with a numerical score ranging from 8 (maximum) to 1 (minimum). The authors did not provide the variables used to measure methodological rigour. Two members of the research team reviewed each article, and week-ly consensus meetings were held to reach agreement on scoring differences. The scores were intended to provide a rapid overview of the methodology and relevance of each article; however, the final aggregate score provided no indication of the allocation of marks. Review-ers commented on the appropriateness and significance of each article, but these comments were not part of the scoring. The authors found that the scoring system was a useful way to guide discussion at the consensus meetings. Scoring was impacted by the reviewers’ inter-pretation of qualitative methodologies and relevance of articles. They also suggested that this more open approach allowed articles that contained important information or sections to be reviewed in spite of low scores. Some papers contained very brief methods sections, which made the scoring difficult and resulted in low scores.

Acknowledging that synthesizing diverse data in reviews is a challenge, Goldsmith, Bank-head and Austoker (2007) developed a new review method. They implemented it on a re-view of evidence to inform guidelines for the content of patient information related to cervical screening. They followed standard procedures for conducting a systematic review, including:

a priori study design• comprehensive literature search• two people independently reviewing titles and abstracts• quality assessment• data extraction•

The researchers used checklists to perform quality assessment on both quantitative and qualitative studies. Established checklists were used for quantitative studies (Critical Ap-praisals Skills Program (CASP), Scottish Intercollegiate Guidelines Network (SIGN), New Zealand Guidelines Group (NZGG) and UK Government Chief Social Researcher’s Office (UKGCSRO)) (CASP, 2005; Khan et al., 2001; Lethaby, Wells, & Furness, 2001; Spencer, Ritchie, & Lewis, 2004) . The researchers developed a checklist for the qualitative studies that included purpose, population, response rate, outcome definition and assessment. All checklists included a comments section. The methodological quality of studies was rated as ++ (all or most criteria were fulfilled), + (some criteria were fulfilled) or – (few or no criteria were fulfilled). The quality appraisal provided insight into the strengths and weaknesses of each study. However, methodological flaws did not result in studies being excluded. Quality scores were incorporated in the data synthesis phase of the review. Goldsmith et al. (2005) incorporated the synthesis guidelines set out in the GRADING system (Atkins et al., 2005) discussed in the Critical Appraisal Tools section of this paper.

DeWalt et al. (2004) conducted a systematic review of studies that measured literacy plus one or more health outcomes. The eligible study designs were observational: prospective and retrospective cohort studies, case-control studies and cross-sectional studies. Their review followed a standard systematic review process, including:

21

a comprehensive literature search of several electronic databases;• well-defined research questions;• inclusion/exclusion criteria;• two authors reviewing full articles• quality assessment and reporting of findings. •

The researchers adapted the quality assessment tool from West et al. (2002). Each study was rated according to:

study population• comparability of subjects• validity and reliability of the literacy measure• maintenance of comparable groups• outcome measure• statistical analysis• controlling for confounding•

Based on these criterions, the studies received a ‘good,’ ‘fair’ or ‘poor’ rating. This tool has not been validated. Although the review authors were able to provide a statistical analysis of the literacy and health outcomes, they cautioned that the primary study data posed many challenges due to the various reading measures used and cut points for analysis. As well, lack of adequate statistical measures, inadequate controlling for confounders and lack of adjustment for multiple comparisons made comparisons between studies difficult.

Thomson et al. (2006) examined data reporting on socioeconomic determinants of health in the UK to determine if governmental investment in the area had improved health. The data from 10 evaluations were synthesised using systematic review methods, including:

a search strategy• a priori inclusion/exclusion criteria• two people independently reviewing articles• data extraction• data synthesis•

Eight of the impact evaluations used case studies where data were gathered from a few sites to represent the national program. Methodological issues with the evaluation reports in-cluded poor evaluation methods and data sources and low sample sizes. The authors used a narrative synthesis to report findings. They reported that this approach was a new way to build evidence with the message tailored to impact the development of health public policy. The synthesis was challenged by the methodological shortcomings of the included studies.

Two studies (Stein, Dalziel, Garside et al., 2005; Dalziel, Round, Stein et al., 2005) exam-ined the quality of evidence of effectiveness used in health technology assessments (HTA). Case series, commonly used in HTAs, constitute a weak form of evidence in the hierarchy of evidence. In ideal situations, case series data would not be used in systematic reviews; however, there are times when they might be the only available evidence. Finding no simi-

22

lar studies in their literature search, Stein et al. examined a sample of systematic reviews of case series that explored the association between the characteristics of case series and outcome. They did not compare case series results with RCT results. In their analysis, they found little evidence of association between methodological features, sample size or prospective approach and outcome. They provided a narrative report of their findings, and included a table of individual outcomes of the analysed variables. They cautioned that all the interventions studied were surgical, which might limit the generalizability of their find-ings. In a second study, Dalziel et al. compared results from RCTs and case series studies in surgical interventions. They examined reports that included data from both RCTs and case series. They compared these studies using the intervention arm of the RCTs as a comparator: meta-analysis of RCT was compared with weighted robust regressions using the intervention as the confounding factor and estimating the coefficient size. Their analysis revealed that the RCTs showed no outcome difference between treatment types, while the case series showed an increase in mortality of 1–2% between treatments. Limitations of this review include:

analysis was constrained by methodological flaws in the case series studies, such • as poor reporting of data;use of specific surgical interventions may limit the generalizability of findings;• findings were based on a small number of studies. •

In a report that examined evidence for, and methods of, evaluating non-randomized trials, Deeks et al. (2003) identified 194 tools used to assess the quality of these trials. Sixty tools covered at least five of six pre-specified domains for internal validity and were classified as ‘top tools.’ Fourteen tools were ranked as ‘best tools,’ covering three of four core items of particular importance for non-randomized studies (allocation, comparable groups, prognostic factors identified and use of case-mix adjustment). The authors identified six of 14 tools as being suitable for use in systematic reviews. The strength of those tools was the phrasing of items that channelled the reviewer’s responses in a systematic way to ensure the assess-ments were as objective as possible.

Katrak et al. (2004) produced a systematic review of 121 published critical assessment tools from 108 papers located by searching electronic databases and the Internet. Most of the tools (87%) were specific to research design, with many (45%) of those developed for ex-perimental studies (RCTs and CCTs). Of a total of 16 generic critical appraisal tools, six were developed for experimental and observational studies. Eleven tools were found to be useful for any qualitative and quantitative research design. The authors extracted 74 items from 19 critical appraisal tools for observational studies. These items focused on:

data analyses• consideration of confounders• sample size or power calculation• whether appropriate statistical analysis was undertaken•

They extracted 36 items from the seven critical appraisal tools for qualitative studies. Most of the items focused on assessing external validity, methods of data analyses and justifica-tion of the study. These tools did not contain items about sample selection, randomization,

23

blinding, intervention or bias. The generic tools, reportedly usable for either experimental or observational studies, contained items that focused on sampling selection and data analy-ses, such as appropriateness of statistical analyses and sample size and power calculation. This review found no gold standard appraisal tool for any type of study.

Section Summary

These systematic reviews successfully incorporated primary studies with data that did not have control groups, or they examined studies with critical appraisal tools for the quality assessment of studies without control groups. The reviews that examined the feasibility of combining the results of RCTs and non-RCTs suggest that well-designed studies can pro-duce similar results, regardless of the study type. Many of these reviews suggest that poor execution of the primary study design, rather than the study design itself, result in lower quality scoring. They site under-reporting of methodology, poor management of confounders and/or lack of consistent analytical terminology as problematic when trying to compare data between studies.

Methods for Synthesizing Quantitative Studies without Control Groups

This section includes five papers that described methods for synthesizing quantitative stud-ies without control groups.

In a 2005 paper, Greenhalgh et al. described their process of meta-narrative review, devel-oped as a methodology for the synthesis of evidence across disciplines. They used a six-step process that included planning, searching for literature, mapping, doing an appraisal, synthesizing and making recommendations. According to the authors, their approach of producing “storied” accounts of the key research traditions moved complex literature out of confusion and into “sensemaking.” Five key principles underpinned this meta-narrative technique:

pragmatism• pluralism• historicity• contestation• peer review•

This approach may be helpful in situations where the scope of the research is broad and the literature diverse, and where researchers have approached a common problem using differ-ent study designs.

Whittemore and Knafl (2005) highlighted the integrative review method as a good option for integrating evidence from studies that emerge from diverse methodologies. The authors outlined strategies to enhance the rigour of integrative reviews. They proposed five steps to help ensure the methodological rigour of this type of review:

1. Problem identification: Ensures the research question is clearly defined. 2. Literature search: Incorporates a comprehensive search strategy.

24

3. Data evaluation: Becomes somewhat complex due to the methodologically diverse primary studies included in the review. Evaluating quality cannot follow the stan-dard systematic review process, but may need to focus on examples of authentic-ity, methodological quality, informational value and representativeness of available primary studies.

4. Data analysis: Includes data reduction, display, comparison and conclusions. Data reduction is achieved through an overall classification system in which primary studies are divided into subgroups that provide a logical order for analysis. This subgroup classification can be based on types of evidence, chronology, setting, sample characteristics or by a predetermined conceptual classification. The next step in the data reduction process is to code and extract data into a manageable framework. Data display involves converting the extracted data by specific vari-ables and subgroups. These displays include matrices, graphs and charts that allow for comparison across studies. Data comparison identifies patterns, themes and relationships between and amongst the variables. The final stage in data analy-sis is drawing conclusions.

5. Presentation: Includes the reporting of findings in an integrative review that can be in the form of tables or diagrams, and should include implications for practice, policy and research.

Norris and Atkins (2005) examined 49 reviews released during a five-year period by the Evidence-based Practice Centers that included evidence from studies using designs other than RCTs. Those designs included cohort studies and time series, before-after case series studies and cross-sectional studies. The authors suggested that one benefit to incorporating cohort studies and case series studies in reviews is that these types of designs are better able to capture long-term outcomes than are RCTs. They observed that while most of the 49 reviews incorporated non-randomized studies, the evidence tended to be from the RCTs. Assessing the quality of non-randomized trials was a challenge. Of the 49 reviews, 25% did not assess quality, 16% used published checklists or scoring systems, 10% adapted pub-lished scoring instruments and 49% used checklists that had been developed by the review-ers. Few of the tools had been tested for validity. To ensure adherence to principles of best evidence and to reduce bias, Norris and Atkins suggested that reviews begin with a detailed protocol which would then be strictly followed from review inception to completion. As well, the search strategy may need to be more fluid and repeated to compensate for the lack of sensitive and specific search strategies used for RCTs. The authors made the following recommendations:

1. Reviewers should assess the availability of RCTs addressing their review question before determining final inclusion criteria.

2. When considering the inclusion of non-randomized trials, reviewers need to consid-er potential biases and whether those biases can be minimized in well-conducted non-randomized trials.

3. The review process must be transparent so that reasons for inclusion and exclusion of study designs are explicit.

4. Reviewers should consider quality assessment of individual studies.

25

5. Reviewers should be aware of the impact of their inclusion criteria on the findings, and include a discussion of the potential impacts of bias on their conclusions.

6. Methodological quality of the included studies needs to be part of the discussion and conclusions, regardless of study design.

Jackson and Waters (2005) looked specifically at the challenges faced in writing systematic reviews for the public health sector. Some of the inherent difficulties included multi-compo-nent interventions, multiple outcomes measured, diverse populations and mixed study de-signs. They agreed with others who say that in public health, the intervention and the popu-lation need to dictate the study design, not vice versa. The researchers acknowledged the controversy around the appropriateness of the systematic review process for public health intervention studies. They also offered suggestions for improving the process as a means of moderating that criticism. Searching for literature is complex and time consuming. Therefore, it is essential to allow sufficient time for an adequate search, as well as to search multiple electronic databases and sources of grey literature. Quality assessment of all included ar-ticles is necessary, and they recommended using the tool developed by the Effective Public Health Practice Project, Hamilton, Ontario (Thomas et al., 2004a) “Quality Assessment Tool for Quantitative Studies.” This strong instrument (Deeks et al., 2003) can be used on RCTs, quasi-experimental and uncontrolled studies. Attention to theoretical frameworks in both primary studies and systematic reviews can help explain the differences that occur between the planning and the outcomes of the interventions. Measuring the integrity of inventions can help determine if an intervention was ineffective because of poor planning, or because the delivery of the intervention was incomplete. Due to the diverse populations receiving public health interventions, researchers should expect heterogeneity. Researchers and reviewers will not find all these components readily incorporated in all primary studies, but aware-ness of these components may help to strengthen systematic reviews emerging from public health intervention research.

Atkins and DiGuiseppi’s 1998 paper examined research needs about preventative health services use. Evidence from RCTs was not always available in the area of health promotion and preventative health. The researchers cited the known problems associated with obser-vational studies as reason for caution; however, they acknowledged that these studies can add to the body of knowledge. The strengths of observational studies include:

establishing linkages in the causal pathway;• building understanding of the natural history of disease;• identifying risk factors;• measuring compliance with and adverse effects of treatments;• determining accuracy of diagnostic tests;• assessing efficacy of interventions. •

Risk of bias can be modified with representative, population-based samples and careful con-sideration of potential confounders. Researchers also need to recognise that observational studies tend to exaggerate beneficial effects of treatment. The authors recommend including a large number of well-designed studies with consistent and preferably large effects when using observational studies to build treatment recommendations.

26

Section Summary

Synthesizing and integrating quantitative data from studies without control groups is chal-lenging. These six papers reported on various methods to incorporate such data in a useful format. The authors agreed that there is benefit to incorporating evidence from diverse study designs, but quality assessment is paramount for providing trustworthy evidence. Primary studies that clearly identify potential biases and probable impacts on effect can be usefully incorporated into systematic reviews.

Combining Qualitative and Quantitative Studies

This section examines three articles that describe methods for synthesizing data from both qualitative and quantitative studies.

Determining the quality of non-experimental studies is challenging and controversial; how-ever, these types of studies often contain important information for public health. Assessing quality using standard tools is not particularly helpful because few of these studies would score in the high or moderate category. Harden et al. (2004) used a standard approach for methodological quality assessment in systematic reviews on 35 primary non-intervention studies. Only four met all seven criteria used to assess quality. They developed a data extraction tool to help deconstruct each study (both quantitative and qualitative). From this, they reconstructed the results and presented them in structured summaries and evidence ta-bles. The synthesis process was non-linear and involved two reviewers going back and forth between the papers, the data extraction and the evidence tables. Pooling results, as done in meta-analysis, was not appropriate. Instead, these researchers used aggregate methods according to identified themes to synthesize findings. To answer the main research question, the reviewers integrated the findings from the non-intervention studies with 36 intervention studies, comparing the results to identify similarities and differences (Oliver et al., 2005). Using these two types of results allowed for greater breadth of perspectives and deeper understanding of public health issues from the point of view of the people who receive the targeted interventions.

Meta-reviews, as described by Wong and Raabe (1996), incorporate both qualitative and quantitative data. The authors suggested using a meta-review to address some of the limi-tations associated with traditional qualitative reviews, such as subjectivity in article selec-tion and weights assigned to individual studies, and lack of quantitative analysis. This was achieved by including the meta-analysis of quantitative data within a narrative review. The first factor for determining whether a study should be included is the study design; studies with different designs should not be grouped together for meta-analysis. Meta-analysis can increase statistical power, but it cannot compensate for other study design methodological issues. There are some basics steps in conducting a meta-review:

1. Define the research question.2. Conduct a comprehensive literature review.3. Create a priori inclusion criteria.4. Conduct a traditional qualitative review.

27

5. Conduct a quantitative meta-analysis.6. Integrate a traditional qualitative review with quantitative meta-analysis.7. Apply criteria for causation in interpretation.

The authors suggest that incorporating a meta-analysis with a traditional narrative review can reduce subjectivity.

A 2007 paper by Sandelowski, Barroso, & Voils, discussed the need for health disciplines to include both qualitative and quantitative research findings in reviews, especially in light of the growth of mixed methods research design. The authors examined 42 reports, including jour-nal articles, unpublished dissertations/theses and technical reports. To differentiate their ap-proach from a standard systematic review, no report was excluded based on methodological quality. To synthesize their findings, they developed a qualitative meta-synthesis model that grouped primary research findings in topical and thematic units. The meta-summary included the extracting, grouping and formatting of findings, as well as the calculating of frequency and effect sizes for the quantitative data. This approach can be used to synthesize mixed meth-ods surveys in which the data collection and analysis procedures are similar.

Section Summary

Incorporating qualitative and quantitative data in one report can be helpful for building an understanding of the success or failure of interventions. However, assessing the quality of studies without control groups can be problematic, especially if quality is assessed with traditional methods used in systematic reviews or meta-analysis. Results of quality assess-ment were not used as exclusion criteria, but were instead incorporated in the discussion of findings. Grouping findings according to a thematic and aggregate method is an appropri-ate alternative approach to meta-analysis. Researchers could incorporate some systematic review processes, such as having two people review and perform data extraction, but most authors suggest using a narrative approach for presenting results when meta-analysis is not appropriate.

Studies Comparing Results Using Differing Methods

The search included two studies that examined issues that arise when comparing results of randomized and non-randomized studies of the same intervention.

Jefferson and Demicheli (1999) assessed the capability of different research designs for testing four aspects of vaccine performance (immunogenicity, duration of immunity con-ferred, incidence and seriousness of side effects, and number of infections prevented by vaccination). Their findings indicated that in vaccinology, experimental and non-experimental study designs are frequently complementary. However, in some situations, vaccine quality could only be measured with one type of study. For instance, an RCT is the preferred study design to measure side effects. Non-experimental designs are appropriately applied when:

an experiment is impossible, unnecessary or inappropriate;• the individual efficacy is to be measured in terms of infrequent adverse events;• when interventions prevent rare events or the population effectiveness of an inter-•

28

vention to be measured in the long term.

Using interview data collected from 17 authors or users of Health Technology Assessments (HTA), Rotstein and Laupacis (2004) shed light on the differences between HTAs and sys-tematic reviews. These differences include:

1. Methodological standards—HTAs may include literature of poor methodological quality if a topic is important to decision-makers.

2. Replication of previous studies—systematic reviews do not need to be repeated if previous studies were of high-quality or when there is no new high-quality evi-dence; in HTAs there is often a need to repeat studies to defend the report’s con-clusions.

3. Choice of topics—topics are more policy-oriented with HTAs, while systematic re-views tend to be driven by effectiveness questions.

4. Inclusion of content experts (in systematic reviews) and policy-makers (in HTAs) as authors.

5. Inclusion of economic evaluations—in HTAs.6. Making policy recommendations—in HTAs.7. Dissemination of report—more often actively done for HTAs.

HTAs are not specifically designed for evaluating scientific evidence, while systematic re-views are designed for this purpose. Yet the impartial nature of scientific investigation would help strengthen the policy decisions emerging from HTA evidence. Different levels of evi-dence (below RCTs in the hierarchy) are more acceptable in HTAs.

Section Summary

These two comparisons of results from different designs about the same intervention point out that it is useful to consider different designs in reviews, not only because they may be the best available information, but also because they can more likely answer questions re-lated to long-term effectiveness, adverse events and rare events. The strengths of including other designs outweigh the limitations.

29

Discussion

The main research focus for this paper was the methods of including results from non-randomized trials, as well as the quality and synthesis of that data. This is an important research question because in many areas of health and public health, evidence from ran-domized control trials is not available. Historically, evidence emerging from non-randomized trials has been criticized as unreliable due to a greater potential of bias. It is commonly understood that there is a greater tendency for selection, allocation or attrition bias in non-randomized trials. Readers of data from non-randomized trials need to be aware that these biases may exist, and researchers should account for these potential biases in their studies. Some suggestions emerging from the studies examined in this paper include:

There is a need for a strong and transparent study design in all non-randomized • trials.The research question should determine the study design.• Study authors need to incorporate detailed information about the study; lack of • information means that readers cannot determine what has or has not been done in the study design.There is a need for consistency in terminology and descriptions across non-ran-• domized studies to facilitate comparisons.

There are several methodological approaches for including data from non-randomized stud-ies. We found the work of Whittemore and Knafl (2005) to be most systematic. Whittemore and Knafl provide a five-step integrative review method that incorporates problem identifica-tion, literature search, data evaluation, data analysis and presentation of data.

Many researchers use various assessment tools to determine the quality of the included studies. Norris and Atkins (2005) provide a comprehensive list of recommendations for assessing study quality that emerged from the 49 reviews they examined. These recom-mendations were outlined earlier in this paper. Other authors used existing checklists or scoring instruments, such as MINORS or GRADING. Some adapted tools to conform to the study designs found in the primary studies, while others developed new tools that reflected the information to be extracted from included studies. Many of these tools have not been tested for validity or reliability. We recommend that further research be done to determine the validity and reliability of these quality assessment tools for non-randomized trials. Where possible, reviewers should use tools that have been tested and determined to be valid and reliable for non-randomized trials, such as the Quality Assessment Instrument for Primary Studies developed by the Effective Public Health Practice Project, Hamilton, Ontario.

Including data from non-randomized trials is challenging; however, several benefits can be achieved from its inclusion. Drawing from non-randomized studies means that researchers can add a wider body of literature in their search strategies (Thomas et al., 2004). Incorpo-rating these studies may broaden the scope of the questions that can be answered. This research can shed light on why, how and for whom interventions or programs succeed or fail (Oliver et al., 2005).

30

Recommendations

There is currently no standardized model for synthesizing the results of studies that do not have control groups. This paper examined synthesis methods, critical appraisal tools and studies that deal with appraising and synthesizing the results from quantitative studies that lack control groups.

This paper outlined some precautions to take when using non-controlled studies as evi-dence, and made recommendations for processes and tools:

1. Incorporate studies without control groups into systematic reviews, especially when there are no other studies to consider. Studies without control groups can also pro-vide information on long-term effectiveness, rare events and adverse effects.

2. Do not generalize the direction of differences in outcome between randomized and non-controlled studies. While effect sizes are more often smaller in randomized controlled trials (RCTs), some show the reverse or the same results across study designs.

3. Use Whittemore and Knafl’s (2005) approach to include results of non-randomized trials. Whittemore and Knafl provide a five-step integrative review method that incorporates problem identification, literature search, data evaluation, data analysis and presentation of data.

4. Recommendations for methods of systematic review including studies without con-trol groups:

a) Use a critical appraisal tool that has been tested for internal consistency, test-retest and inter-rater reliability, and at least face and criterion validity (Downs & Black, 1998; Slim, Nini, Forestier et al., 2003; Steuten, Vrijhoef, van-Merode et al., 2004; Thomas, Ciliska, Dobbins et al., 2004a; Zaza, Wright-De Aguero, Briss et al., 2000).

b) Do not meta-analyse results from observational studies.5. Suggestions to authors of primary studies:

a) Ensure a strong and transparent study design in all non-randomized trials.b) Confirm that the research question determines the study design.c) Incorporate detailed information about the study; lack of information means

that readers cannot determine what has or has not been done in the study design.

d) Ensure consistency in terminology and descriptions across non-random-ized studies to facilitate comparisons.

6. Implications for further research:a) Conduct reliability and validity testing of any critical appraisal tools. b) Conduct further studies to assess the different results obtained by random-

ized versus non-randomized trials. 7. Including data from non-randomized trials is challenging; however, several benefits

can be achieved:

31

a) A wider body of literature can be included in search strategies (Thomas, Harden, Oakley et al., 2004).

b) The scope of the questions that can be answered may be broadened. c) This research can shed light on why, how and for whom interventions or

programs succeed or fail (Oliver, Harden, Rees et al., 2005).

32

Stud

yPu

rpos

eM

etho

ds/R

esul

tsC

omm

ents

/Issu

esR

ange

l, et

al.

2003

US

A

CR

ITIC

AL

A

PPR

AIS

AL

TO

OLS

To d

escr

ibe

the

deve

lopm

ent a

nd

pote

ntia

l app

licat

ions

of

a s

tand

ardi

zed

qual

ity a

sses

smen

t sc

ale

desi

gned

for

retro

spec

tive

stud

ies

in p

edia

tric

surg

ery.

Dev

elop

ed a

com

preh

ensi

ve q

ualit

y as

sess

men

t ins

trum

ent i

ncor

pora

ting

30 it

ems

with

in th

ree

subs

cale

s.

The

subs

cale

s w

ere

desi

gned

to a

sses

s cl

inic

al re

leva

nce,

repo

rting

met

hodo

logy

and

the

stre

ngth

of c

oncl

usio

ns.

Glo

bal q

ualit

y ra

tings

(poo

r, fa

ir or

goo

d) w

ere

deriv

ed b

y co

mbi

ning

sco

res

from

eac

h su

b-sc

ale.

Six

inde

pend

ent r

evie

wer

s ex

amin

ed in

ter-

rate

r rel

iabi

lity

by a

sses

sing

the

inst

rum

ent i

n 10

re

trosp

ectiv

e st

udie

s fro

m p

edia

tric

surg

ery

liter

atur

e.

Find

ings

:

The

inst

rum

ent h

ad e

xcel

lent

inte

r-ra

ter r

elia

bilit

y.•

Ther

e w

ere

high

leve

ls o

f agr

eem

ent f

or a

ll ite

ms

with

in in

stru

men

t (84

.6%

con

cord

ance

, n

•=

1573

item

s) a

nd fo

r all

indi

vidu

al s

ubsc

ales

(ran

ge, 7

3.3%

to 8

5.8%

, n =

60

to 1

258)

Con

clus

ions

:

The

auth

ors

have

dev

elop

ed a

sta

ndar

dize

d an

d •

relia

ble

qual

ity a

sses

smen

t sca

le fo

r the

ana

lysi

s of

retro

spec

tive

data

in p

edia

tric

surg

ery.

Res

ults

may

not

be

gene

raliz

able

to o

ther

inte

rven

tions

.

Pot

entia

l app

licat

ions

:

Pro

vide

pra

ctic

ing

surg

eons

with

a k

now

ledg

e •

base

to c

ritic

ally

eva

luat

e pu

blis

hed

retro

spec

-tiv

e da

ta;

Pro

vide

sta

ndar

dize

d m

etho

dolo

gy fo

r SR

of

•ex

istin

g re

tro d

ata;

Dev

elop

sta

ndar

dize

d re

porti

ng g

uide

lines

for

•us

e in

pee

r-re

view

ed jo

urna

ls.

Mar

getts

, et a

l.

1995

UK

CR

ITIC

AL

A

PPR

AIS

AL

TO

OLS

To d

evel

op a

sco

ring

syst

em to

hel

p ju

dge

thescientificqual

-ity

of o

bser

vatio

nal

epid

emio

logi

c st

udie

s lin

king

die

t with

risk

of

canc

er.

The

scor

ing

syst

em w

as d

evel

oped

from

key

hea

ding

s us

ed in

dev

elop

ing

rese

arch

pro

toco

ls

and

incl

uded

que

stio

ns u

nder

hea

ding

s:

thre

e fo

r cas

e co

ntro

l stu

dies

(die

tary

ass

essm

ent,

recr

uitm

ent o

f sub

ject

s an

d an

alys

is) –

scor

ing

syst

em in

App

endi

x A

;

fourforcohortstudies(dietaryassessm

ent,definitionofcohort,ascertainmentandanaly

-•

sis)

– s

corin

g sy

stem

in A

ppen

dix

B.

Inte

r-ob

serv

er v

aria

tion

was

ass

esse

d:

Ther

e w

as g

ood

agre

emen

t bet

wee

n ob

serv

ers

in th

e ra

nkin

g of

stu

dies

.

The

scor

ing

syst

em w

as a

pplie

d to

a re

view

of m

eat a

nd c

ance

r ris

k:

From

whatseemedlikealargeliteraturesample,relativelyfewstudiesscoredwell(definedas

a sc

ore

> 65

%).

How

ever

, the

se s

tudi

es te

nded

to p

rovi

de m

ore

cons

iste

nt in

form

atio

n.

For c

ase

cont

rol s

tudi

es, 3

4 ou

t of 1

06 s

tudi

es s

core

d >

65%

.

For c

ohor

t stu

dies

, 10

out o

f 41

stud

ies

scor

ed >

65%

.

The

prot

otyp

e sc

orin

g sy

stem

repo

rted

here

hel

ped

the

auth

ors

desc

ribe

the

epid

emio

logi

c da

ta o

n di

et

and

canc

er, b

ut it

is n

ot g

ener

aliz

able

to o

ther

set

-tin

gs o

r int

erve

ntio

ns.

Table summarizing results of relevant papers

33

Stud

yPu

rpos

eM

etho

ds/R

esul

tsC

omm

ents

/Issu

esD

owns

& B

lack

1998

UK

CR

ITIC

AL

A

PPR

AIS

AL

TO

OLS

To te

st th

e fe

asib

ility

of

cre

atin

g a

valid

and

re

liabl

e ch

eckl

ist w

ith

the

follo

win

g fe

atur

es:

App

ropr

iate

for a

s-se

ssin

g bo

th ra

ndom

-iz

ed a

nd n

on-r

ando

m-

ized

stu

dies

.

Pro

vide

s bo

th a

n ov

eral

l sco

re fo

r stu

dy

qualityandaprofile

of s

core

s, n

ot o

nly

for

the

qual

ity o

f rep

ortin

g,

inte

rnal

val

idity

and

po

wer

, but

als

o fo

r ex

tern

al v

alid

ity.

The

stud

y be

gan

with

a p

ilot v

ersi

on o

f the

tool

. The

rese

arch

ers

aske

d th

ree

expe

rts to

eva

lu-

ate

face

and

con

tent

val

idity

. Afte

rwar

d, tw

o ra

ters

use

d th

e to

ol to

ass

ess

10 ra

ndom

ized

and

10

non

-ran

dom

ized

stu

dies

to m

easu

re re

liabi

lity.

Find

ings

:

Usi

ng d

iffer

ent r

ater

s, th

e ch

eckl

ist w

as re

vise

d an

d te

sted

for i

nter

nal c

onsi

sten

cy (K

uder

-•

Ric

hard

son

20),

test

-ret

est a

nd in

ter-

rate

r rel

iabi

lity,

crit

erio

n va

lidity

and

resp

onde

nt

burd

en.

The

rese

arch

ers

foun

d hi

gh in

tern

al c

onsi

sten

cy in

Qua

lity

Inde

x.

Test

-ret

est a

nd in

ter-

rate

r rel

iabi

lity

of th

e Q

ualit

y In

dex

wer

e go

od.

Con

clus

ions

:

It is

pos

sibl

e to

pro

duce

a c

heck

list t

hat e

valu

-•

ates

rand

omiz

ed a

nd n

on-r

ando

miz

ed s

tudi

es.

The

mai

n co

ncer

n is

with

the

chec

klis

t’s e

xter

nal

•va

lidity

and

app

licat

ion

of it

to to

pics

bey

ond

the

stud

y’s

purp

ose.

Slim

, et a

l.

2003

Fran

ce

CR

ITIC

AL

A

PPR

AIS

AL

TO

OLS

To d

evel

op a

nd

valid

ate

a M

etho

d-ol

ogic

al In

dex

for N

on-

Ran

dom

ized

Stu

dies

(M

INO

RS

) whi

ch c

ould

be

use

d by

read

ers,

m

anus

crip

t rev

iew

ers

or jo

urna

l edi

tors

to

asse

ss th

e qu

ality

of

such

stu

dies

.

The

inde

x co

nsis

ted

of 1

2 ite

ms.

In th

e ca

se o

f non

-com

para

tive

stud

ies,

item

s in

clud

ed:

a st

ated

aim

of t

he s

tudy

;•

incl

usio

n of

con

secu

tive

patie

nts;

pros

pect

ive

colle

ctio

n of

dat

a;•

endp

oint

app

ropr

iate

to th

e st

udy

aim

;•

unbi

ased

eva

luat

ion

of e

ndpo

ints

;•

follo

w-u

p pe

riod

appr

opria

te to

the

maj

or e

ndpo

int;

loss

to fo

llow

-up

not e

xcee

ding

5%

.•

In th

e ca

se o

f com

para

tive

stud

ies,

item

s in

clud

ed:

a co

ntro

l gro

up h

avin

g th

e go

ld s

tand

ard

inte

rven

tion;

cont

empo

rary

gro

ups;

base

line

equi

vale

nce

of g

roup

s;•

pros

pect

ive

calc

ulat

ion

of th

e sa

mpl

e si

ze;

stat

istic

al a

naly

ses

adap

ted

to th

e st

udy

desi

gn.

Theindexhadgoodinter-review

eragreement,hightest-retestreliabilitybythekappa-coeffi-

cientandgoodinternalconsistencybyCronbach’salphacoefficient.

MIN

OR

S is

a v

alid

inst

rum

ent d

esig

ned

to a

sses

s th

e m

etho

dolo

gica

l qua

lity

of n

on-r

ando

miz

ed s

urgi

cal

stud

ies,

whe

ther

com

para

tive

or n

on-c

ompa

rativ

e.

Res

earc

hers

may

nee

d to

use

cau

tion

if ap

plyi

ng th

e in

stru

men

t to

non-

surg

ical

inte

rven

tions

.

34

Stud

yPu

rpos

eM

etho

ds/R

esul

tsC

omm

ents

/Issu

esZa

za e

t al.

2000

US

CR

ITIC

AL

A

PPR

AIS

AL

TO

OLS

1. T

o as

sist

with

co

nsis

tenc

y, re

duce

bi

as a

nd im

prov

e va

lidity

for r

esea

rch-

ers

usin

g th

e G

uide

to

Com

mun

ity

Pre

vent

ion

Ser

vice

s:

Sys

tem

atic

Rev

iew

s an

d E

vide

nce-

Bas

ed

Rec

omm

enda

tions

(th

e G

uide

)

2. T

o de

velo

p a

stan

dard

ized

dat

a ex

tract

ion

form

to u

se

with

the

Gui

de.

The

form

was

dev

elop

ed b

y:

revi

ewin

g m

etho

dolo

gies

from

oth

er s

yste

mat

ic re

view

s;•

repo

rting

sta

ndar

ds e

stab

lishe

d by

hea

lth a

nd s

ocia

l sci

ence

jour

nals

;•

exam

inin

g ev

alua

tion,

sta

tistic

al a

nd m

eta-

anal

ysis

lite

ratu

re;

solic

iting

exp

ert o

pini

ons.

The

form

was

use

d to

ass

ess

the

met

hodo

logi

cal q

ualit

y of

prim

ary

stud

ies

(23

ques

tions

) and

to

cla

ssify

and

des

crib

e ke

y ch

arac

teris

tics

of th

e in

terv

entio

n an

d ev

alua

tion

(26

ques

tions

).

The

rese

arch

ers

exam

ined

stu

dy p

roce

dure

s an

d re

sults

and

ass

esse

d th

reat

s to

val

idity

ac

ross

six

cat

egor

ies:

inte

rven

tion

and

stud

y de

scrip

tions

sam

plin

g•

mea

sure

men

t•

anal

ysis

inte

rpre

tatio

n of

resu

lts•

othe

r exe

cutio

n is

sues

This

pro

cess

impr

oved

the

valid

ity a

nd re

liabi

lity

of

the

Gui

de.

The

stan

dard

ized

dat

a ex

tract

ion

form

can

ass

ist

rese

arch

ers

and

read

ers

of p

rimar

y st

udie

s to

revi

ew

the

cont

ent a

nd q

ualit

y of

the

stud

ies.

The

form

can

al

so a

ssis

t in

the

deve

lopm

ent o

f man

uscr

ipts

for

subm

issi

on to

pee

r rev

iew

ed jo

urna

ls.

Gre

er, e

t al.

2000

US

CR

ITIC

AL

A

PPR

AIS

AL

TO

OLS

(evi

denc

e gr

adin

g sy

stem

s)

To d

escr

ibe

in d

etai

l th

e ev

iden

ce a

nd

conc

lusi

on g

radi

ng

syst

em d

evel

oped

by

the

Inst

itute

for C

linic

al

Sys

tem

s Im

prov

emen

t (IC

SI)

for u

se b

y th

e pr

actic

ing

clin

icia

ns

who

writ

e th

e do

cu-

men

ts a

nd u

se th

em

in m

akin

g de

cisi

ons

abou

t pat

ient

car

e

The

grad

ing

syst

em in

clud

ed a

n ev

alua

tion

of in

divi

dual

rese

arch

repo

rts a

nd a

n as

sess

men

t of

the

over

all s

treng

th o

f the

evi

denc

e su

ppor

ting

a pa

rticu

lar c

oncl

usio

n or

reco

mm

enda

tion.

The

ratin

g sy

stem

to a

sses

s th

e qu

ality

of i

ndiv

idua

l res

earc

h re

ports

can

be

quic

kly

unde

rsto

od

and

mor

e ea

sily

use

d by

pra

ctic

ing

phys

icia

ns.

The

inte

nt o

f the

gra

ding

sys

tem

is to

hig

hlig

ht a

repo

rt th

at is

unu

sual

ly s

trong

or m

arke

dly

flawed.

The

evid

ence

gra

ding

sys

tem

:

incl

udes

the

conc

lusi

on g

radi

ng w

orks

heet

, whi

ch c

alls

for s

tate

men

t of a

con

clus

ion,

a

•su

mm

ary

of re

sear

ch re

ports

that

sup

port

or d

ispu

te th

e co

nclu

sion

, ass

ignm

ent o

f cla

sses

an

d qu

ality

mar

kers

to th

e re

sear

ch re

ports

, and

ass

ignm

ent o

f a g

rade

to th

e co

nclu

sion

;

has

been

use

d in

the

writ

ing

of m

ore

than

40

guid

elin

es a

nd n

umer

ous

tech

nolo

gy a

sses

s-•

men

t rep

orts

.

Con

clus

ions

:

The

syst

em a

ppea

rs to

be

succ

essf

ul in

over

com

ing

the

com

plex

ity o

f som

e pu

blis

hed

syst

ems

of g

radi

ng e

vide

nce,

whi

le s

till y

ield

ing

a defensibleclassificationofconclusionsbasedon

the

stre

ngth

of t

he u

nder

lyin

g ev

iden

ce.

The

syst

em’s

relia

bilit

y ne

eds

to b

e rig

orou

sly

•te

sted

.

35

Stud

yPu

rpos

eM

etho

ds/R

esul

tsC

omm

ents

/Issu

esS

teut

en, e

t al.

2004

Net

herla

nds

CR

ITIC

AL

A

PPR

AIS

AL

TO

OLS

To d

escr

ibe

to w

hat

exte

nt e

xist

ing

inst

rum

ents

are

use

-fu

l in

asse

ssin

g th

e m

etho

dolo

gica

l qua

lity

of H

ealth

Tec

hnol

ogy

Ass

essm

ent (

HTA

) of

dise

ase

man

agem

ent.

Problem

swereidentifiedintheassessmentofthemethodologicalqualityofsixHTA

sofdis

-ea

se m

anag

emen

t with

thre

e di

ffere

nt in

stru

men

ts. T

he p

robl

ems

mai

nly

conc

erne

d:

stud

y de

sign

(RC

T ve

rsus

obs

erva

tiona

l stu

dies

); •

crite

ria fo

r sel

ectio

n an

d re

stric

tion

of p

atie

nts;

base

line

and

outc

ome

mea

sure

s (s

ever

al p

aram

eter

s ar

e ne

cess

ary)

;•

blin

ding

of p

atie

nts

and

prov

ider

s (w

hich

is im

poss

ible

in d

isea

se m

anag

emen

t);•

desc

riptio

n of

com

plex

, mul

tifac

eted

inte

rven

tions

and

avo

idan

ce o

f co-

inte

rven

tions

.•

The

rese

arch

ers

prop

osed

and

val

idat

ed a

new

inst

rum

ent,

HTA

-DM

, tha

t:

incl

udes

four

com

pone

nts

(stu

dy p

opul

atio

n, d

escr

iptio

n of

inte

rven

tion,

mea

sure

men

t of

•ou

tcom

es a

nd d

ata-

anal

ysis

/pre

sent

atio

n of

dat

a).

cont

ains

15

item

s (o

n in

tern

al v

alid

ity, e

xter

nal v

alid

ity, s

tatis

tical

con

side

ratio

ns).

Cho

u &

Hel

fand

2005

US

A

MET

HO

DS

(Sys

tem

atic

R

evie

ws

(SR

) of

har

ms)

CR

ITIC

AL

A

PPR

AIS

AL

TO

OLS

To re

view

the

met

hodo

logi

cal c

hal-

leng

es in

per

form

ing

SR

s of

har

ms

and

high

light

exa

mpl

es o

f ap

proa

ches

to th

em

from

96

EP

C e

vide

nce

repo

rts.

Som

e st

udie

s ha

ve fo

und

that

obs

erva

tiona

l stu

dies

and

RC

Ts re

port

sim

ilar e

stim

ates

of e

f-fe

cts.

Oth

ers

have

foun

d th

at c

linic

al tr

ials

repo

rt hi

gher

risk

s fo

r adv

erse

eve

nts

than

obs

erva

tiona

l st

udie

s (p

ossi

bly

due

to p

oore

r ass

essm

ent o

f har

m in

obs

erva

tiona

l stu

dies

, or t

hey

may

be

mor

e lik

ely

to b

e pu

blis

hed

if th

ey re

port

good

met

hods

/ res

ults

).

Incl

udin

g la

rge

data

base

s m

ay p

rovi

de u

sefu

l inf

orm

atio

n ab

out h

arm

.

Dat

a fro

m p

ract

ice-

base

d ne

twor

ks a

re o

ften

riche

r in

clin

ical

det

ail t

han

adm

inis

trativ

e da

taba

s-es—itmaybepossibletoidentifyandmeasurelikelyconfounderswithmoreconfidence,but

theyarehardertofindusingelectronicsearchesbecausemanysuchanalysesareproprietary.

Incl

udin

g ca

se re

ports

can

hel

p id

entif

y un

com

mon

, une

xpec

ted

or lo

ng-te

rm a

dver

se d

rug

even

ts th

at a

re o

ften

diffe

rent

from

thos

e de

tect

ed in

clin

ical

tria

ls.

Ass

essi

ng th

e qu

ality

of h

arm

repo

rting

:

Sev

eral

SR

s fo

und

that

pro

spec

tive

or re

trosp

ectiv

e de

sign

, cas

e-co

ntro

l com

pare

d w

ith

•co

hort

stud

ies,

and

sm

alle

r com

pare

d w

ith la

rger

cas

e se

ries

had

no c

lear

effe

ct o

n es

ti-m

ates

of h

arm

.

Bet

ter d

ata

abou

t har

m is

nee

ded

to c

ondu

ct b

al-

ance

d S

Rs.

Furth

er re

sear

ch is

nee

ded

to e

mpi

rical

ly d

eter

min

e th

e im

pact

of i

nclu

ding

dat

a fro

m d

iffer

ent s

ourc

es o

n as

sess

men

ts o

f har

m, a

nd fu

rther

dev

elop

and

test

cr

iteria

for r

atin

g th

e qu

ality

of h

arm

repo

rting

.

It is

impo

rtant

to:

avoi

d in

appr

opria

te s

tatis

tical

com

bina

tions

of

•da

ta;

care

fully

des

crib

e th

e ch

arac

teris

tics

and

qual

ity

•of

incl

uded

stu

dies

;

thor

ough

ly e

xplo

re p

oten

tial s

ourc

es o

f het

ero-

•ge

neity

.

36

Stud

yPu

rpos

eM

etho

ds/R

esul

tsC

omm

ents

/Issu

esM

cInt

osh,

et a

l.

2004

UK

MET

HO

DS

(SR

s)

To p

rese

nt th

e ex

peri-

ence

of c

ondu

ctin

g sy

stem

atic

revi

ews

(SR

s) o

f har

mfu

l ef

fect

s an

d to

mak

e su

gges

tions

for f

utur

e pr

actic

e an

d fu

rther

re

sear

ch.

The

auth

ors

eval

uate

d th

e m

etho

ds u

sed

in th

ree

SR

s, fo

cusi

ng o

n th

e re

view

que

stio

n, s

tudy

de

sign

s an

d qu

ality

ass

essm

ent.

Rev

iew

que

stio

n:

Onereview

hadaspecificquestion,focusedonprovidinginformationonspecificharmful

•ef

fect

s to

furn

ish

an e

cono

mic

mod

el; t

he o

ther

two

revi

ews

had

broa

der q

uest

ions

.

Stu

dy d

esig

ns:

Allthreeincludedrandom

izedandobservationaldata,buteachdefinedtheinclusion

•cr

iteria

diff

eren

tly.

Qua

lity

asse

ssm

ent:

App

lied

publ

ishe

d ch

eckl

ists

—en

coun

tere

d pr

oble

ms;

Inad

equa

te re

porti

ng o

f bas

ic d

esig

n fe

atur

es o

f the

prim

ary

stud

ies—

chec

klis

ts o

mit

key

•fe

atur

es, s

uch

as h

ow h

arm

ful e

ffect

s da

ta w

ere

reco

rded

.

Key

are

as fo

r im

prov

emen

t inc

lude

:

focu

sing

the

revi

ew q

uest

ion,

as

broa

d an

d no

n-•specificquestionsarenothelpfultotheprocess

of re

view

;

deve

lopi

ng s

tand

ardi

zed

met

hods

for t

he q

ualit

y •

asse

ssm

ent o

f stu

dies

of h

arm

ful e

ffect

s.

Atk

ins,

et a

l.

2005

US

A

CR

ITIC

AL

A

PPR

AIS

AL

TO

OLS

(gra

ding

sys

tem

)

To p

ilot t

est a

nd fu

rther

de

velo

p th

e G

RA

DE

ap

proa

ch to

gra

ding

ev

iden

ce a

nd re

com

-m

enda

tions

.

The

GR

AD

E a

ppro

ach

was

dev

elop

ed b

y th

e G

RA

DE

Wor

king

Gro

up to

gra

de th

e qu

ality

of

evid

ence

and

the

stre

ngth

of r

ecom

men

datio

ns.

Researchersused12evidenceprofilesinthispilotstudy.

Eachprofilewasmadebasedoninformationavailableinasystematicreview

.

17 p

eopl

e in

depe

nden

tly g

rade

d th

e le

vel o

f evi

denc

e an

d th

e st

reng

th o

f rec

omm

enda

tions

for

eachofthe12evidenceprofiles.

Qua

lity

of e

vide

nce

for e

ach

outc

ome:

Res

earc

hers

foun

d th

at in

add

ition

to s

tudy

des

ign,

qua

lity,

con

sist

ency

and

dire

ctne

ss,

•otherqualitycriteriaalsoinfluencedjudgmentsaboutevidence;sparsedata,strongas-

soci

atio

ns, p

ublic

atio

n bi

as, d

ose

resp

onse

and

situ

atio

ns w

here

all

plau

sibl

e co

nfou

nder

s strengthenedratherthanweakenedconfidenceinthedirectionoftheeffect.

Rel

iabi

lity:

Ther

e w

as a

var

ied

amou

nt o

f agr

eem

ent o

n th

e qu

ality

of e

vide

nce

for t

he o

utco

mes

•relatingtoeachofthetwelvequestions(kappacoefficientsforagreementbeyondchance

rang

ed fr

om 0

to 0

.82)

.

Ther

e w

as fa

ir ag

reem

ent a

bout

the

rela

tive

impo

rtant

of e

ach

outc

ome.

Therewaspooragreementaboutthebalanceofbenefitsandharmsandrecommendations

•(c

ould

, in

part,

be

expl

aine

d by

the

accu

mul

atio

n of

all

the

prev

ious

diff

eren

ces

in g

radi

ng

of th

e qu

ality

and

impo

rtanc

e of

the

evid

ence

).

Mos

t of t

he d

iscr

epan

cies

wer

e ea

sily

reso

lved

thro

ugh

disc

ussi

on.

37

Stud

yPu

rpos

eM

etho

ds/R

esul

tsC

omm

ents

/Issu

esK

han,

et a

l.

2001

UK

CR

ITIC

AL

A

PPR

AIS

AL

TO

OLS

To p

rovi

de b

ackg

roun

d in

form

atio

n on

stu

dy

qual

ity a

nd d

escr

ibe

how

to d

evel

op

qual

ity a

sses

smen

t ch

eckl

ists

.

The

rese

arch

ers

desc

ribed

reas

ons

to in

clud

e st

udy

qual

ity a

sses

smen

t in

revi

ews,

type

s of

bi

ases

, som

e m

etho

ds to

pro

tect

aga

inst

bia

s in

prim

ary

effe

ctiv

enes

s st

udie

s an

d ho

w q

ualit

y as

sess

men

t ins

trum

ents

sho

uld

be d

evel

oped

.

Qua

lity

crite

ria fo

r ass

essm

ent o

f obs

erva

tiona

l stu

dies

incl

udes

ans

wer

ing

a se

ries

of q

ues-

tions

for c

ohor

t stu

dies

, cas

e-co

ntro

l stu

dies

and

cas

e se

ries.

Coh

ort s

tudi

es:

Istheresufficientdescriptionofthegroupsandthedistributionofprognosticfactors?

Arethegroupsassem

bledatasimilarpointintheirdiseaseprogression?

Istheintervention/treatmentreliablyascertained?

Werethegroupscom

parableonallimportantconfoundingfactors?

Wasthereadequateadjustmentfortheeffectsoftheseconfoundingvariables?

Wasadose-responserelationshipbetweeninterventionandoutcom

edemonstrated?

Wasoutcomeassessmentblindtoexposurestatus?

Wasfollow-uplongenoughfortheoutcomestooccur?

Whatproportionofthecohortw

asfollowed-up?

Wer

e dr

op-o

ut ra

tes

and

reas

ons

for d

rop-

out s

imila

r acr

oss

inte

rven

tion

and

unex

pose

d •groups?

Cas

e-co

ntro

l stu

dies

:

Isthecasedefinitionexplicit?

Hasthediseasestateofthecasesbeenreliablyassessedandvalidated?

Werethecontrolsrandom

lyselectedfromthesourceofpopulationofthecases?

How

com

parablearethecasesandcontrolswithrespecttopotentialconfoundingfactors?

Wereinterventionsandotherexposuresassessedinthesamewayforcasesandcontrols?

Con

clus

ions

:

In a

revi

ew th

at in

corp

orat

es th

ese

stud

y de

-•signs,itcanbedifficulttodetermineifdifferences

are

a re

sult

of th

e in

terv

entio

n or

gro

up in

com

pat-

ibili

ties.

Res

ults

sho

uld

be v

iew

ed w

ith c

autio

n.•

38

Stud

yPu

rpos

eM

etho

ds/R

esul

tsC

omm

ents

/Issu

esTh

omas

, et a

l.

2004

Can

ada

CR

ITIC

AL

A

PPR

AIS

AL

TO

OLS

MET

HO

DS

(SR

s)

To d

escr

ibe

four

issu

es

rela

ted

to s

yste

mat

ic

liter

atur

e re

view

s of

th

e ef

fect

iven

ess

of

publ

ic h

ealth

nur

sing

in

terv

entio

ns:

1. p

roce

ss o

f sys

tem

-at

ical

ly re

view

ing

the

liter

atur

e;

2. d

evel

opm

ent o

f a

qual

ity a

sses

smen

t in

stru

men

t;

3. th

e m

etho

ds/

resu

lts o

f the

EP

HP

P to

dat

e;

4. s

ome

met

hods

/re

sults

of t

he d

is-

sem

inat

ion

used

.

Dom

ains

for a

sses

sing

obs

erva

tiona

l stu

dies

:

stud

y qu

estio

n•

stud

y po

pula

tion

com

para

bilit

y of

sub

ject

s•

expo

sure

/inte

rven

tion

outc

ome

mea

sure

blin

ding

stat

istic

al a

naly

sis

resu

lts•

disc

ussi

on•

fund

ing

Com

pone

nts

of th

e E

PH

PP

Qua

lity

Ass

essm

ent T

ool:

sele

ctio

n bi

as•

desi

gn•

conf

ound

ers

blin

ding

data

col

lect

ion

met

hods

with

draw

als

and

drop

-out

s•

This

mod

el o

f sco

ring

has

been

test

ed a

s a

relia

ble

and

valid

tool

.

This

sco

ring

syst

em in

dica

tes

a pr

efer

ence

for R

CTs

an

d C

CTs

, whi

ch w

ill te

nd to

sco

re h

ighe

r tha

n no

n-ra

ndom

ized

stu

dies

.

The

best

sco

re th

at c

an b

e ac

hiev

ed b

y ob

serv

atio

nal

stud

ies

is “m

oder

ate,

” and

onl

y if

all o

ther

com

po-

nent

s of

the

stud

y ar

e ve

ry w

ell d

esig

ned.

Ram

say,

et a

l.

2003

US

A

CR

ITIC

AL

A

PPR

AIS

AL

TO

OLS

MET

HO

DS

(inte

rrup

ted

time

serie

s (IT

S)

desi

gns)

To c

ritic

ally

revi

ew th

e m

etho

dolo

gica

l qua

l-ity

of I

TS (i

nter

rupt

ed

time

serie

s) d

esig

ns

usin

g st

udie

s in

clud

ed

in tw

o sy

stem

atic

re

view

s (a

revi

ew o

f m

ass

med

ic in

terv

en-

tions

and

a re

view

of

guid

elin

e di

ssem

ina-

tion

and

impl

emen

ta-

tion

stra

tegi

es).

Qua

lity

crite

ria fo

r ITS

des

igns

:

Inte

rven

tion

occu

rred

inde

pend

ently

of o

ther

cha

nges

ove

r tim

e.•

Inte

rven

tion

was

unl

ikel

y to

affe

ct d

ata

colle

ctio

n.•

The

prim

ary

outc

ome

was

ass

esse

d bl

indl

y or

was

mea

sure

d ob

ject

ivel

y.•

The

prim

ary

outc

ome

was

relia

ble

or w

as m

easu

red

obje

ctiv

ely.

The

com

posi

tion

of th

e da

ta s

et a

t eac

h tim

e po

int c

over

ed a

t lea

st 8

0% o

f the

tota

l num

ber

•of

par

ticip

ants

in th

e st

udy.

Theshapeoftheinterventioneffectwaspre-specified.

A ra

tiona

le fo

r the

num

ber a

nd s

paci

ng o

f dat

a po

ints

was

des

crib

ed.

The

stud

y w

as a

naly

sed

appr

opria

tely

usi

ng ti

me

serie

s te

chni

ques

.•

Find

ings

:

A to

tal o

f 66%

(n =

38)

of I

TS s

tudi

es d

id n

ot ru

le o

ut th

e th

reat

that

ano

ther

eve

nt c

ould

have

occ

urre

d at

the

poin

t of i

nter

vent

ion.

Rep

ortin

g of

fact

ors

rela

ted

to d

ata

colle

ctio

n, th

e pr

imar

y ou

tcom

e an

d co

mpl

eten

ess

of

•da

tase

t wer

e ge

nera

lly d

one

in b

oth

revi

ews.

All

the

stud

ies

wer

e co

nsid

ered

“effe

ctiv

e” in

the

orig

inal

repo

rt, b

ut a

ppro

xim

atel

y ha

lf of

•there-analysedstudiesshow

ednostatisticallysignificantdifferences.

Inte

rrup

ted

time

serie

s de

sign

s ar

e of

ten

anal

ysed

in

appr

opria

tely,

und

erpo

wer

ed a

nd p

oorly

repo

rted

in

impl

emen

tatio

n re

sear

ch.

39

Stud

yPu

rpos

eM

etho

ds/R

esul

tsC

omm

ents

/Issu

esC

onn

& R

antz

2003

US

A

CR

ITIC

AL

A

PPR

AIS

AL

TO

OLS

MET

HO

DS

(qua

lity

and

stud

y ou

tcom

es)

1. T

o di

scus

s w

ays

that

rese

arch

ers

con-

ceiv

e of

and

ass

ess

qual

ity.

2. T

o de

term

ine

asso

-ci

atio

ns b

etw

een

stud

y qu

ality

and

out

com

es.

3. T

o de

term

ine

stra

te-

gies

for m

anag

ing

the

varie

d qu

ality

of

prim

ary

stud

ies

in a

m

eta-

anal

ysis

.

Ass

essi

ng m

etho

dolo

gica

l qua

lity:

No

sing

le s

tand

ard

exis

ts fo

r add

ress

ing

the

qual

ity v

aria

tions

in p

rimar

y st

udie

s.•

Tabl

e 1

– su

mm

ary

of c

omm

only

not

ed c

ompo

nent

s of

inte

rven

tion

rese

arch

qua

lity.

Inst

rum

ents

to m

easu

re q

ualit

y –

mor

e th

an 1

00 p

rimar

y st

udy

qual

ity s

cale

s ex

ist f

or

•m

easu

ring

the

qual

ity o

f prim

ary

stud

ies,

but

they

var

y dr

amat

ical

ly in

siz

e, c

ompo

sitio

n,

com

plex

ity a

nd e

xten

t of d

evel

opm

ent.

Qua

lity

mea

sure

men

t sca

les

have

dev

elop

men

tal a

nd a

pplic

atio

n pr

oble

ms.

Mos

t sca

les

resu

lt in

a s

ingl

e, o

vera

ll sc

ore

for q

ualit

y. O

nly

a fe

w s

cale

s co

ntai

n su

bsca

les

•thatprofilestrengthsandweaknesses.

Instrumentshaveprovendifficulttoapplyconsistently,eveninRCTs.

Rel

atio

nshi

p be

twee

n qu

ality

and

stu

dy o

utco

mes

:

Find

ings

hav

e of

ten

been

con

tradi

ctor

y.•

Thefindingsaremixedonwhetherlow-qualitystudiesunder-orover-estim

ateeffectsizes

•co

mpa

red

to h

igh

qual

ity s

tudi

es.

Diff

eren

t sca

les

gene

rate

div

erse

ass

essm

ents

of s

tudy

qua

lity,

whi

ch c

ause

inco

nsis

tenc

y •

in e

fforts

to re

late

stu

dy q

ualit

y to

out

com

e.

Stra

tegi

es to

man

age

qual

ity:

Set

min

imum

leve

ls fo

r inc

lusi

on o

r req

uire

that

cer

tain

qua

lity

attri

bute

s be

pre

sent

—ca

n •

invo

lve

setti

ng in

clus

ion

crite

ria, s

elec

ting

a pr

iori

parti

cula

r qua

lity-

scal

e cu

t-off

scor

es;

limitation:excludingresearchthatmaybeoflowerrigourgoesagainstthescientifichabitof

exam

inin

g da

ta, l

ettin

g th

e da

ta s

peak

.

Wei

ght e

ffect

siz

es b

y qu

ality

sco

res—

allo

ws

incl

usio

n of

div

erse

stu

dies

, but

relie

s on

ques

tiona

ble

qual

ity m

easu

res.

Con

side

r qua

lity

to b

e an

em

piric

al q

uest

ion—

can

exam

ine

asso

ciat

ions

bet

wee

n qu

ality

and

effe

ct s

izes

and

thus

pre

serv

e th

e pu

rpos

e of

met

a-an

alys

is to

sys

tem

atic

ally

exa

min

e data;limitation:problem

swithscalemeasuresofoverallqualitymaylimitconfidencein

findingsrelatedtooverallqualitymeasures.

Con

clus

ions

:

Res

earc

hers

are

incr

easi

ngly

com

bini

ng s

trate

-•

gies

to o

verc

ome

the

limita

tions

of u

sing

a s

ingl

e ap

proa

ch.

Furth

er w

ork

to d

evel

op v

alid

mea

sure

s of

prim

ary

stud

y qu

ality

dim

ensi

ons

will

impr

ove

the

abili

ty o

f met

a-an

alys

es to

info

rm re

sear

ch a

nd

nurs

ing

prac

tice.

40

Stud

yPu

rpos

eM

etho

ds/R

esul

tsC

omm

ents

/Issu

esLi

nde,

et a

l.

2002

Ger

man

y

STU

DIE

S

(ran

dom

ized

vs.

no

n-ra

ndom

ized

)

To in

vest

igat

e:

1. P

ossi

ble

diffe

r-en

ces

in p

atie

nts,

in

terv

entio

ns,

desi

gn-in

depe

nden

t qu

ality

asp

ects

, and

re

spon

se ra

tes

of

rand

omiz

ed v

s. n

on-

rand

omiz

ed s

tudi

es

of a

cupu

nctu

re fo

r ch

roni

c he

adac

he.

2. If

non

-ran

dom

ized

st

udie

s pr

ovid

e re

leva

nt a

dditi

onal

in

fo (i

.e.,

met

hods

/re

sults

on

long

-term

ou

tcom

es, p

rogn

ostic

fa

ctor

s, a

dver

se

effe

cts,

resp

onse

ra

tes)

.

3. P

ossi

ble

expl

ana-

tions

if re

spon

se

rate

s in

rand

omiz

ed

and

non-

rand

omiz

ed

patie

nts

diffe

r.

Qua

lity

crite

ria:

Isthesamplingmethodclearandclearlydescribed?

Isthereaclearheadachediagnosis?

Arepatientscharacterized(atleastage,sex,duration,severityofsym

ptom

s)?

Isthereatleastafour-weekbaselineperiod?

Arethereatleasttwoclinicalheadacheoutcom

es?

Isaheadachediaryused?

Areco-interventionsdescribed?

Areatleast90%

ofpatientsincludedanalysedaftertreatment?

Areatleast80%

ofpatientstreatedanalysedatearly(<

6months)follow-up?

Areatleast80%

ofpatientstreatedanalysedatlate(≥6months)follow-up?

Met

hods

/Res

ults

:

59studieswereincluded:24random

izedtrials,and35non-random

izedstudies(fivenon-

•ra

ndom

ized

con

trolle

d co

hort

stud

ies,

10

pros

pect

ive

unco

ntro

lled

stud

ies,

10

case

ser

ies

and

10 c

ross

sec

tiona

l sur

veys

).

10ofthe24RCTsand26ofthe35non-randomizedstudiesmetlessthanfivequality

•cr

iteria

.

A to

tal o

f 535

pat

ient

s re

ceiv

ed a

cupu

nctu

re tr

eatm

ent i

n th

e 24

RC

Ts c

ompa

red

to 2

695

in

•th

e 35

non

-ran

dom

ized

stu

dies

.

On

aver

age,

rand

omiz

ed tr

ials

had

sm

alle

r sam

ple

size

s, m

et m

ore

qual

ity c

riter

ia a

nd h

ad

•lo

wer

resp

onse

rate

s (0

.59;

0.4

8-0.

69) v

s. 0

.78

(0.7

2-0.

83).

Ran

dom

ized

or n

ot, s

tudi

es m

eetin

g m

ore

qual

ity c

riter

ia h

ad lo

wer

resp

onse

rate

s.•

Non-randomizedstudiesdidn’thavesignificantlylongerfollow-upperiods.O

fthethreethat

•in

clud

ed a

n an

alys

is o

f pro

gnos

tic v

aria

bles

, onl

y on

e re

porte

d on

adv

erse

effe

cts,

and

the

degr

ee o

f gen

eral

izab

ility

was

unc

lear

.

41

Stud

yPu

rpos

eM

etho

ds/R

esul

tsC

omm

ents

/Issu

esM

acLe

hose

, et a

l.

2000

UK

STU

DIE

S

(RC

Ts v

s. Q

uasi

-ex

perim

enta

l an

d ob

serv

atio

n (Q

EO

))

To c

ompa

re e

stim

ates

of

effe

ctiv

enes

s fro

m

RC

Ts v

s. Q

EO

s an

d ex

amin

e th

e as

soci

atio

n be

twee

n m

etho

dolo

gica

l qua

lity

and

mag

nitu

de o

f ef-

fect

iven

ess

estim

ates

.

Stra

tegy

1: C

ompa

re R

CT

and

QEO

stu

dy e

stim

ates

of e

ffect

iven

ess

of a

ny

inte

rven

tion,

whe

re b

oth

estim

ates

wer

e re

port

ed in

a s

ingl

e pa

per.

14 p

aper

s w

ith 3

8 co

mpa

rison

s w

ere

revi

ewed

; 25

wer

e of

low

qua

lity

and

13 w

ere

of h

igh

qual

ity.

Dis

crep

anci

es b

etw

een

RC

T an

d Q

EO

stu

dy e

stim

ates

of e

ffect

siz

e an

d ou

tcom

e fre

quen

cy

for i

nter

vent

ion

and

cont

rol g

roup

s w

ere

smal

ler f

or h

igh-

than

low

-qua

lity

com

paris

ons.

Con

clus

ion:

QE

O s

tudy

est

imat

es o

f effe

ctiv

enes

s m

ay b

e va

lid if

impo

rtant

con

foun

ding

fact

ors

are

•co

ntro

lled

for.

Low

qua

lity

com

paris

ons

tend

to h

ave

mor

e ex

trem

e Q

EO

stu

dy e

stim

ates

of e

ffect

. •

Met

hods

/resu

lts a

re li

mite

d by

the

num

ber o

f pap

ers

revi

ewed

(few

) and

pot

entia

lly u

nrep

-•

rese

ntat

ive

natu

re o

f evi

denc

e re

view

ed.

Stra

tegy

2: C

ompa

re R

CT

and

QEO

stu

dy e

stim

ates

of e

ffect

iven

ess

for

spec

ified

inte

rven

tions

, whe

re th

e es

timat

es w

ere

repo

rted

in d

iffer

ent

pape

rs.

Inte

rven

tions

: mam

mog

raph

ic s

cree

ning

(MS

BC

) of w

omen

to re

duce

mor

talit

y fro

m b

reas

t ca

ncer

; fol

ic a

cid

supp

lem

enta

tion

(FA

S) t

o pr

even

t neu

ral t

ube

defe

cts

in w

omen

tryi

ng to

co

ncei

ve.

34 p

aper

s w

ere

revi

ewed

(17

on M

SB

C, 1

7 on

FA

S).

Eightandfourpapers,respectively,wereindividuallyorclusterassignedRCTs,fiveandsix

wer

e no

n-ra

ndom

ized

tria

ls o

r coh

ort s

tudi

es, a

nd th

ree

and

six

wer

e m

atch

ed o

r unm

atch

ed

case

–con

trol s

tudi

es. T

wo

stud

ies,

one

of M

SB

C a

nd o

ne o

f FA

S, u

sed

som

e ot

her s

tudy

de

sign

.

Bot

h co

hort

and

case

-con

trol s

tudi

es h

ad lo

wer

tota

l qua

lity

scor

es th

an R

CTs

; coh

ort s

tudi

es

alsohadsignificantlylowerscoresthancase-controlstudies.

Met

a-re

gres

sion

of s

tudy

attr

ibut

es a

gain

st re

lativ

e ris

k es

timat

es s

how

ed n

o as

soci

atio

n be

twee

n ef

fect

siz

e an

d st

udy

qual

ity fo

r eith

er in

terv

entio

n.

Estimatesfrom

RCTsandcohortstudieswerenotsignificantlydifferent,butcase-controlstud-

iesgavesignificantlydifferentestimatesforbothMSBC(greaterbenefit)andFA

S(lessbenefit).

Rec

omm

enda

tions

:

Sta

ndar

ds fo

r rep

ortin

g qu

asi-e

xper

imen

tal a

nd

•ob

serv

atio

nal s

tudi

es s

houl

d be

dev

elop

ed.

Inno

vativ

e se

arch

stra

tegi

es s

houl

d be

exp

lore

d.•

Met

hods

for i

dent

ifyin

g st

udie

s th

at p

rovi

de a

dire

ct c

ompa

rison

of e

stim

ates

from

rand

omiz

ed

and

non-

rand

omiz

ed d

ata

are

need

ed.

42

Stud

yPu

rpos

eM

etho

ds/R

esul

tsC

omm

ents

/Issu

esB

ritto

n

1998

UK

STU

DIE

S

(RC

Ts v

s. n

on-

RC

Ts)

To e

xplo

re th

e is

sues

re

late

d to

the

proc

ess

of ra

ndom

izat

ion

that

m

ay a

ffect

the

valid

ity

of c

oncl

usio

ns d

raw

n fro

m th

e M

etho

ds/R

e-su

lts o

f RC

Ts a

nd n

on-

rand

omiz

ed s

tudi

es

Pre

viou

s co

mpa

rison

s of

RC

Ts a

nd n

on-r

ando

miz

ed s

tudi

es:

18 p

aper

s th

at d

irect

ly c

ompa

red

the

Met

hods

/Res

ults

of R

CTs

and

pro

spec

tive

non-

ran-

•do

miz

ed s

tudi

es w

ere

foun

d an

d an

alyz

ed

Nei

ther

the

RC

Ts n

or th

e no

nran

dom

ized

stu

dies

con

sist

ently

gav

e la

rger

or s

mal

ler e

sti-

•m

ates

of t

he tr

eatm

ent e

ffect

.

Thetypeofinterventiondidnotappeartobeinfluential

Exc

lusi

ons:

The

num

ber o

f elig

ible

sub

ject

s in

clud

ed in

the

RC

Ts ra

nged

from

1%

to 1

00%

. •

Rea

sons

for e

xclu

sion

s m

ay b

e m

edic

al (e

.g. h

igh

risk

of a

dver

se e

vent

s in

cer

tain

gro

ups)

•orscientific(selectingonly

smal

l hom

ogen

eous

gro

ups

in o

rder

to in

crea

se th

e pr

ecis

ion

of e

stim

ated

trea

tmen

t ef-

•fe

cts)

; als

o bl

anke

t exc

lusi

ons

are

com

mon

in R

CTs

Par

ticip

atio

n:

Mos

t RC

Ts fa

iled

to d

ocum

ent a

dequ

atel

y th

e ch

arac

teris

tics

of e

ligib

le in

divi

dual

s w

ho d

id

•no

t par

ticip

ate

in tr

ials

.

Adj

ustin

g fo

r bas

elin

e di

ffere

nces

:

in n

on-r

ando

miz

ed s

tudi

es: a

djus

tmen

t for

diff

eren

ces

ofte

n ch

ange

d th

e tre

atm

ent e

ffect

•sizebutnotsignificantly;importantly,thedirectionofchangewasinconsistent.

Con

clus

ions

:

Met

hods

/Res

ults

of R

CTs

and

non

-ran

dom

ized

stud

ies

do n

ot in

evita

bly

diffe

r

The

avai

labl

e ev

iden

ce s

uffe

rs fr

om m

any

•lim

itatio

ns b

ut it

sug

gest

ed th

at it

may

be

pos-

sibl

e to

min

imiz

e an

y di

ffere

nces

by

ensu

ring

that

sub

ject

s in

clud

ed in

eac

h ty

pe o

f stu

dy a

re

com

para

ble.

Lem

mer

, et a

l.

1999

UK

MET

HO

DS

(SR

s)

To re

port

on k

ey

problemsanddifficul-

ties

enco

unte

red

in a

sy

stem

atic

lite

ratu

re

revi

ew th

at, i

n th

e ab

-se

nce

of R

CTs

, dre

w

upon

theo

retic

al s

tud-

ies

of d

ecis

ion-

mak

ing

and

prac

tice-

base

d re

sear

ch.

A to

tal o

f 164

revi

ews

of 8

2 pu

blic

atio

ns w

ere

carr

ied

out.

AmodifiedversionoftheCochraneCollaborationProtocolw

asused:

Dat

abas

es a

nd k

eyw

ords

use

d fo

r sea

rch

stra

tegi

es w

ere

sele

cted

in c

onsu

ltatio

n w

ith a

libra

rian

spec

ializ

ing

in h

ealth

and

rela

ted

subj

ects

.

Om

itted

sec

tions

wer

e co

nsid

ered

inap

prop

riate

for r

epor

ts th

at w

ere

not R

CTs

, and

wer

e •

repl

aced

with

hea

ding

s fo

r a b

road

er ra

nge

of m

etho

dolo

gies

.

Ther

e w

as a

lso

a se

ctio

n at

the

end

of th

e pr

otoc

ol fo

r rev

iew

ers’

com

men

ts o

n th

e ap

prop

riate

-nessandsignificanceofeachreview

.

A nu

mer

ical

sco

re w

as g

iven

to e

ach

artic

le; 8

is th

e m

axim

um s

core

and

1 is

the

min

imum

sc

ore.

This

sca

le w

as u

sed

to in

dica

te th

e m

etho

dolo

gica

l rig

our a

nd re

leva

nce

of e

ach

artic

le.

Qua

litat

ive

stud

ies

prov

ide

little

info

rmat

ion

on m

eth-

odology,makingqualityassessm

entdifficult.•All

revi

ewer

s ag

reed

, how

ever

, tha

t som

e of

the

artic

les

that

wer

e ra

ted

poor

con

tain

ed im

porta

nt is

sues

or

sect

ions

who

se o

mis

sion

wou

ld b

e a

detri

men

t to

the

stud

y.

Reachingaconsensusscorewasdifficultasindi

-vi

dual

revi

ewer

s in

terp

rete

d ar

ticle

s an

d re

sear

ch

desi

gns

in d

iffer

ent w

ays.

43

Stud

yPu

rpos

eM

etho

ds/R

esul

tsC

omm

ents

/Issu

esG

olds

mith

, et a

l.

2007

UK

MET

HO

DS

To d

escr

ibe

the

revi

ew

met

hods

dev

elop

ed

andthedifficulties

enco

unte

red

dur-

ing

the

proc

ess

of

upda

ting

a sy

stem

atic

re

view

of e

vide

nce

to

info

rm g

uide

lines

for

the

cont

ent o

f pat

ient

in

form

atio

n re

late

d to

ce

rvic

al s

cree

ning

.

Stu

dies

of w

omen

in a

n or

gani

zed,

sys

tem

atic

cer

vica

l scr

eeni

ng p

rogr

am w

ere

incl

uded

; all

stud

y de

sign

s, e

xcep

t for

opi

nion

pie

ces,

wer

e in

clud

ed.

233

stud

ies

wer

e re

triev

ed fo

r fur

ther

eva

luat

ion.

Dat

a ex

tract

ion

was

con

duct

ed fo

r 79

stud

ies

(52

quan

titat

ive

and

27 q

ualit

ativ

e).

32 s

tudi

es w

ere

repo

rted

as re

leva

nt.

13 q

ualit

ativ

e st

udie

s, s

even

non

-com

para

tive

desc

riptiv

e st

udie

s, th

ree

RC

Ts, t

hree

qua

si-

RC

Ts, t

hree

cro

ss s

ectio

nal s

tudi

es a

nd o

ne e

ach

of c

lust

er R

CT,

retro

spec

tive

case

con

trol

stud

y, a

nd n

on-c

ompa

rativ

e tim

e se

ries

stud

y w

ere

incl

uded

in th

e re

port.

The

qual

ity o

f eac

h in

divi

dual

stu

dy w

as a

sses

sed

usin

g es

tabl

ishe

d ch

eckl

ists

: SIG

N, C

AS

P,

NZG

G, U

KG

CS

RO

and

a re

view

er c

heck

list.

The

basi

c pr

inci

ples

of t

he G

RA

DE

app

roac

h w

ere

appl

ied

to th

e sy

nthe

sis

of th

e qu

antit

ativ

e ev

iden

ce.

Con

clus

ions

:

The

revi

ew q

uest

ions

wer

e be

st a

nsw

ered

by

•ev

iden

ce fr

om a

rang

e of

dat

a so

urce

s.

Qua

litat

ive

rese

arch

was

ofte

n hi

ghly

rele

vant

•andspecifictomanycomponentsofthescreen

-in

g in

form

atio

n m

ater

ials

.

DeW

alt,

et a

l.

2004

US

A

MET

HO

DS

To re

view

the

rela

tion-

ship

bet

wee

n lit

erac

y an

d he

alth

out

com

es.

Rev

iew

of S

tudi

es –

ex

amin

es o

bser

va-

tiona

l stu

dies

that

re

porte

d or

igin

al d

ata

mea

surin

g lit

erac

y w

ith

any

valid

inst

rum

ent

and

mea

sure

d on

e or

m

ore

heal

th o

utco

mes

.

Incl

uded

stu

dies

with

out

com

es re

late

d to

hea

lth a

nd h

ealth

ser

vice

and

mea

sure

men

t of

liter

acy

skill

s w

ith a

val

id in

stru

men

t (W

est e

t al.,

200

2).

Gra

ded

each

stu

dy a

ccor

ding

to:

adeq

uacy

of s

tudy

pop

ulat

ion

com

para

bilit

y of

sub

ject

s•

valid

ity a

nd re

liabi

lity

of th

e lit

erac

y m

easu

rem

ent

mai

nten

ance

of c

ompa

rabl

e gr

oups

appr

opria

tene

ss o

f the

out

com

e m

easu

rem

ent

appr

opria

tene

ss o

f sta

tistic

al a

naly

sis

adeq

uacy

of c

ontro

l of c

onfo

undi

ng

The

aver

age

qual

ity o

f the

incl

uded

stu

dies

was

fair

to g

ood.

The

auth

ors

foun

d th

at m

ost s

tudi

es fa

iled

to a

d-eq

uate

ly a

ddre

ss c

onfo

undi

ng a

nd th

e us

e of

mul

tiple

co

mpa

rison

s.

Thom

son,

et a

l.

2006

UK

MET

HO

DS

To s

ynth

esiz

e da

ta o

n th

e ke

y so

cioe

cono

mic

de

term

inan

ts o

f hea

lth

and

heal

th in

equa

litie

s re

porte

d in

eva

luat

ions

of

the

natio

nal U

K

rege

nera

tion

prog

ram

.

Incl

uded

eva

luat

ions

that

repo

rted

achi

evem

ents

dra

win

g on

dat

a fro

m a

t lea

st tw

o ta

rget

are

as

of a

nat

iona

l urb

an re

gene

ratio

n pr

ogra

m, o

r are

a-ba

sed

initi

ativ

es (A

BIs

) in

the

UK

.

Nin

etee

n ev

alua

tions

repo

rted

impa

cts

on h

ealth

or s

ocio

econ

omic

det

erm

inan

ts o

f hea

lth. D

ata

was

syn

thes

ized

from

10

eval

uatio

ns.

Sta

ndar

d sy

stem

atic

revi

ew p

roce

dure

s w

ere

used

, inc

ludi

ng:

com

preh

ensi

ve s

earc

h st

rate

gy;

a pr

iori

incl

usio

n/ex

clus

ion

crite

ria;

two

peop

le in

depe

nden

tly re

view

ing

artic

les;

data

ext

ract

ion;

data

syn

thes

is.

Met

hodo

logi

cal s

hortc

omin

gs o

f the

prim

ary

stud

ies

chal

leng

ed th

e da

ta s

ynth

esis

pro

cess

.

The

auth

ors

sugg

este

d th

at in

terv

entio

ns a

nd

prog

ram

s at

tach

ed to

a th

eore

tical

fram

ewor

k (e

.g.,

theo

ry o

f cha

nge)

will

hel

p to

bui

ld e

vide

nce.

44

Stud

yPu

rpos

eM

etho

ds/R

esul

tsC

omm

ents

/Issu

esS

tein

, et a

l.

2005

US

A

MET

HO

DS

(hea

lth te

chno

l-og

y as

sess

men

ts

(HTA

s))

To in

vest

igat

e th

e as

soci

atio

n be

twee

n fre

quen

cy a

nd m

eth-

odol

ogic

al c

hara

c-te

ristic

s in

a s

ampl

e of

hea

lth te

chno

logy

as

sess

men

ts.

The

rese

arch

ers

incl

uded

revi

ews

on fo

ur in

terv

entio

ns in

thre

e re

ports

.

Dat

a w

as e

xtra

cted

from

pub

lishe

d an

d un

publ

ishe

d re

ports

and

was

ext

ract

ed b

y on

e re

view

er

and

chec

ked

by a

sec

ond.

Vario

us re

gres

sion

ana

lysi

s te

sts

wer

e un

derta

ken

on th

e da

ta.

The

stud

y ou

tcom

e m

easu

res

wer

e re

porte

d at

diff

eren

t tim

e pe

riods

, dep

endi

ng o

n th

e le

ngth

of

follo

w-u

p of

the

entir

e st

udy.

The

revi

ew lo

oked

at:

sam

ple

size

pros

pect

ive/

retro

spec

tive

appr

oach

mul

ti- o

r sin

gle-

cent

re o

rgan

izat

ions

cons

ecut

ive

recr

uitm

ent

inde

pend

ence

of o

utco

me

mea

sure

leng

th o

f fol

low

-up

publ

icat

ion

date

Find

ings

:

Littl

e ev

iden

ce w

as fo

und

of a

n as

soci

atio

n be

twee

n m

etho

dolo

gica

l cha

ract

eris

tics

and

•ou

tcom

e.

Sam

ple

size

and

pro

spec

tive

appr

oach

wer

e no

t sho

wn

to b

e as

soci

ated

with

out

com

e.•

All

outc

omes

wer

e su

rgic

al in

terv

entio

ns, w

hich

may

im

pact

gen

eral

izab

ility

.

The

smal

l num

ber o

f cas

es a

nd li

mite

d nu

mbe

r of

stud

ies

in e

ach

set o

f cas

e se

ries

may

lim

it th

e pr

eci-

sion

and

gen

eral

izab

ility

.

Cas

e se

ries

may

be

parti

cula

rly p

rone

to p

ublic

atio

n bi

as.

45

Stud

yPu

rpos

eM

etho

ds/R

esul

tsC

omm

ents

/Issu

esD

alzi

el, e

t al.

2005

UK

MET

HO

DS

(cas

e se

ries

in

HTA

s)

STU

DIE

S

(cas

e se

ries

vs.

RC

Ts)

1. T

o re

view

the

use

of c

ase

serie

s (C

S) i

n N

atio

nal I

nstit

ute

for

Clin

ical

Exc

elle

nce

(NIC

E) H

TA re

ports

.

2. T

o sy

stem

atic

ally

re

view

the

met

hod-

olog

ical

lite

ratu

re fo

r pa

pers

rela

ting

to th

e va

lidity

of a

spec

ts o

f ca

se s

erie

s de

sign

.

3. T

o in

vest

igat

e ch

ar-

acteristicsandfindings

of c

ase

serie

s us

ing

exam

ples

from

the

UK

’s H

TA p

rogr

am

Rev

iew

of u

se o

f cas

e se

ries

in N

ICE

HTA

s:

Of 4

7 co

mpl

eted

HTA

s, 1

4 in

clud

ed in

form

atio

n fro

m C

Ss.

Incl

usio

n cr

iteria

for C

Ss

incl

uded

stu

dy s

ize

and

leng

th o

f fol

low

-up.

Ther

e w

as n

o co

nsen

sus

on w

hich

CS

s to

incl

ude

in H

TAs,

how

to u

se th

em o

r how

to a

sses

s th

eir q

ualit

y.

Sys

tem

atic

revi

ew o

f met

hodo

logi

cal l

itera

ture

:

Asearchwasconductedtofindstudiesthatassessedaspectsofcaseseriesdesign,analysis

or q

ualit

y in

rela

tion

to s

tudy

val

idity

.

No

em

piric

al s

tudi

es w

ere

foun

d.

Investigationofcharacteristicsandfindingsofcaseseriesdesigns:

No

rela

tions

hip

was

foun

d be

twee

n sa

mpl

e si

ze a

nd o

utco

me

frequ

ency

or b

etw

een

pros

pec-

tive

data

col

lect

ion

and

outc

ome

frequ

ency

.

Oneanalysiseach(indifferenttopicareas)foundasignificantassociationbetweenmulti-centre

stud

ies

and

outc

ome,

bet

wee

n in

depe

nden

t out

com

e m

easu

rem

ent a

nd o

utco

me

frequ

ency

an

d be

twee

n ea

rlier

pub

licat

ion

and

outc

ome

frequ

ency

.

Lengthoffollow-upwasfoundtobesignificantlyassociatedwithoutcomefrequencyinthree

anal

yses

.

Com

paris

on b

etw

een

case

ser

ies

and

RC

Ts:

Com

pare

d w

ith R

CT

evid

ence

, whi

ch s

how

ed n

o di

ffere

nce

betw

een

PTC

A an

d C

AB

G, c

ase

serie

s es

timat

es o

f mor

talit

y sh

owed

a 1

–2%

incr

ease

in m

orta

lity

for C

AB

G.

For a

ngin

a re

curr

ence

, nei

ther

cas

e se

ries

nor R

CT

data

sho

wed

any

diff

eren

ce b

etw

een

the

two

inte

rven

tions

.

The

incl

uded

stu

dies

wer

e al

l sur

gica

l int

erve

ntio

ns

whi

ch m

ight

lim

it ge

nera

lizab

ility

to o

ther

inte

rven

-tio

ns o

r set

tings

.

The

stud

y dr

ew o

n a

smal

l num

ber o

f cas

es, t

here

-fo

re re

sults

sho

uld

be v

iew

ed w

ith c

autio

n.

46

Stud

yPu

rpos

eM

etho

ds/R

esul

tsC

omm

ents

/Issu

esD

eeks

, et a

l.

2003

UK

CR

ITIC

AL

A

PPR

AIS

AL

TO

OLS

(eva

luat

e ab

ility

of

case

-mix

adj

ust-

men

t to

cont

rol f

or

bias

)

To c

onsi

der m

etho

ds

and

rela

ted

evid

ence

fo

r eva

luat

ing

bias

in

non

-ran

dom

ized

in

terv

entio

n st

udie

s.

Eig

ht s

tudi

es c

ompa

red

the

resu

lts o

f ran

dom

ized

and

non

-ran

dom

ized

stu

dies

acr

oss

mul

tiple

in

terv

entio

ns.

Atotalof194toolswereidentifiedforassessingnon-randomizedstudies.

Find

ings

:

The

resu

lts fr

om n

on-r

ando

miz

ed s

tudi

es s

omet

imes

diff

er fr

om th

e re

sults

of r

ando

miz

ed

•st

udie

s of

the

sam

e in

terv

entio

n.

Man

y qu

ality

ass

essm

ent t

ools

exi

st, b

ut a

ppra

isal

of s

tudi

es o

mits

key

qua

lity

dom

ains

.•

Non

-ran

dom

ized

stu

dies

sho

uld

only

be

unde

rtake

n w

hen

RC

Ts a

re n

ot fe

asib

le.

Kat

rak,

et a

l.

2004

Aus

tralia

CR

ITIC

AL

A

PPR

AIS

AL

TO

OLS

To s

umm

ariz

e co

nten

t, in

tent

, con

stru

ctio

n an

d ps

ycho

met

-ric

pro

perti

es o

f pu

blis

hed,

cur

rent

ly

avai

labl

e cr

itica

l ap-

prai

sal t

ools

to id

entif

y co

mm

on e

lem

ents

and

th

eir r

elev

ance

to a

l-lie

d he

alth

rese

arch

.

A to

tal o

f 121

pub

lishe

d cr

itica

l app

rais

al to

ols

wer

e in

clud

ed in

the

SR

, sou

rced

from

108

pa

pers

.

Find

ings

:

87%oftoolswerespecifictoresearchdesign,withmosttoolshavingbeendevelopedfor

•ex

perim

enta

l stu

dies

(38%

of a

ll to

ols

sour

ced)

.

Ther

e w

as g

reat

var

iabi

lity

in it

ems

cont

aine

d in

the

criti

cal a

ppra

isal

tool

s.•

12%(n=14instruments)ofavailabletoolsweredevelopedusingspecifiedempirical

•re

sear

ch.

49%

of t

he to

ols

sum

mar

ized

qua

lity

appr

aisa

l int

o a

num

eric

sum

mar

y sc

ore.

Few

tool

s ha

d do

cum

ente

d ev

iden

ce o

f val

idity

of t

heir

item

s, o

r rel

iabi

lity

of u

se.

Gui

delin

es re

gard

ing

adm

inis

tratio

n of

tool

s w

ere

prov

ided

in 4

3% (n

= 5

2) o

f cas

es.

Con

clus

ions

:

Ther

e is

no

“gol

d st

anda

rd” c

ritic

al a

ppra

isal

tool

for a

ny s

tudy

des

ign,

nor

is th

ere

any

wid

ely

ac-

cept

ed g

ener

ic to

ol th

at c

an b

e ap

plie

d eq

ually

-w

ell a

cros

s st

udy

type

s.

Con

sum

ers

of re

sear

ch s

houl

d ca

refu

lly s

elec

t •

tool

s.

Sel

ecte

d to

ols

shou

ld h

ave

publ

ishe

d ev

iden

ce

•of

em

piric

al b

asis

for t

heir

cons

truct

ion,

val

idity

of

item

s an

d re

liabi

lity

of in

terp

reta

tion,

and

gu

idel

ines

for u

se.

47

Stud

yPu

rpos

eM

etho

ds/R

esul

tsC

omm

ents

/Issu

esG

reen

halg

h, e

t al.

2005

UK

MET

HO

DS

(met

a-na

rrat

ive

revi

ew)

To in

trodu

ce/d

escr

ibe

and

repo

rt on

the

deve

lopm

ent o

f a n

ew

met

hod:

met

a-na

rra-

tive

revi

ew.

Met

a-na

rrat

ive

revi

ew w

as d

evel

oped

as

the

met

hodo

logi

cal b

ase

for t

he s

ynth

esis

of e

vide

nce

acrossmultipledisciplinaryfields.Ithasparticularstrengthsasasynthesismethodwhen:

the

scop

e of

a p

roje

ct is

bro

ad a

nd th

e lit

erat

ure

dive

rse;

diffe

rent

gro

ups

of s

cien

tists

hav

e as

ked

diffe

rent

que

stio

ns a

nd u

sed

diffe

rent

rese

arch

desi

gns

to a

ddre

ss a

com

mon

pro

blem

;

‘quality’papershavedifferentdefiningfeaturesindifferentliterature;

ther

e is

no

univ

ersa

lly a

gree

d-up

on p

roce

ss fo

r pul

ling

the

diffe

rent

bod

ies

of li

tera

ture

toge

ther

.

The

stud

y in

volv

ed a

sys

tem

atic

map

ping

pha

se to

col

lect

and

com

pare

the

diffe

rent

ove

r-ar

chin

g st

oryl

ines

of t

he ri

se a

nd fa

ll of

diff

usio

n re

sear

ch th

at is

judg

ed re

leva

nt to

the

over

all

rese

arch

que

stio

n.

Pha

ses

of m

eta-

narr

ativ

e re

view

:

plan

ning

sear

chin

g•

map

ping

appr

aisa

l•

synt

hesi

s•

reco

mm

enda

tions

Five

key

prin

cipl

es u

nder

pin

the

met

a-na

rrat

ive

tech

niqu

e: p

ragm

atis

m, p

lura

lism

, his

toric

ity,

cont

esta

tion

and

peer

revi

ew.

The

met

hod

shou

ld b

e te

sted

pro

spec

tivel

y.

Its c

ontri

butio

n to

the

mix

ed e

cono

my

of m

etho

ds fo

r th

e sy

stem

atic

revi

ew o

f com

plex

evi

denc

e sh

ould

be

expl

ored

furth

er.

Whi

ttem

ore

&

Knafl

2005

US

A

MET

HO

DS

(inte

grat

ive

revi

ew)

To d

istin

guis

h th

e in

te-

grat

ive

revi

ew m

etho

d fro

m o

ther

revi

ew

met

hods

and

to p

ro-

pose

met

hodo

logi

cal

strategiesspecificto

the

inte

grat

ive

revi

ew

met

hod

to e

nhan

ce

rigou

r.

Issu

es d

escr

ibed

and

stra

tegi

es u

sed

to e

nhan

ce th

e rig

our o

f the

inte

grat

ive

revi

ew m

etho

d in

nu

rsin

g:

Well-specifiedreview

purposeandvariablesofinterest—

helptoaccuratelyoperationalize

•va

riabl

es a

nd e

xtra

ct a

ppro

pria

te d

ata

from

prim

ary

sour

ces.

Well-definedliteraturesearchstrategies—identifythemaximum

num

berofeligibleprim

ary

•so

urce

s, u

sing

at l

east

two

to th

ree

stra

tegi

es.

Dat

a ev

alua

tion

stag

e—si

nce

inte

grat

ive

revi

ew m

etho

d in

clud

es d

iver

se p

rimar

y so

urce

s,

•th

e co

mpl

exity

of e

valu

atin

g th

e qu

ality

incr

ease

s; h

ow q

ualit

y is

eva

luat

ed w

ill v

ary

depe

ndin

g on

the

sam

plin

g fra

me;

in a

revi

ew w

ith a

div

erse

sam

plin

g fra

me

incl

usiv

e of

em

piric

al a

nd th

eore

tical

sou

rces

, an

appr

oach

sim

ilar t

o hi

stor

ical

rese

arch

to e

valu

ate

qual

ity m

ay b

e ap

prop

riate

.

Dat

a re

duct

ion—

divi

de th

e pr

imar

y so

urce

s in

to s

ubgr

oups

acc

ordi

ng to

som

e lo

gica

l •

syst

em to

faci

litat

e an

alys

is.

Datadisplay,datacomparison,conclusiondraw

ingandverification,presentation.

The

inte

grat

ive

revi

ew m

etho

d al

low

s fo

r the

com

bi-

natio

n of

div

erse

met

hodo

logi

es (e

.g.,

expe

rimen

tal

and

non-

expe

rimen

tal r

esea

rch)

and

has

the

pote

ntia

l to

pla

y a

grea

ter r

ole

in e

vide

nce-

base

d pr

actic

e.

48

Stud

yPu

rpos

eM

etho

ds/R

esul

tsC

omm

ents

/Issu

esN

orris

& A

tkin

s

2005

US

A

MET

HO

DS

(SR

s)

To e

xam

ine

the

use

of

non-

rand

omiz

ed s

tud-

ies

in e

vide

nce-

base

d pr

actic

e ce

nter

(EP

C)

repo

rts, a

ddre

ssin

g qu

estio

ns o

f the

effe

c-tiv

enes

s of

trea

tmen

t in

terv

entio

ns.

Of t

he 1

07 E

PC

repo

rts re

leas

ed b

etw

een

Feb

1999

and

Sep

t 200

4, 7

8 ex

amin

ed a

t lea

st o

ne

questionofefficacyoreffectivenessofaclinicalintervention.

49 o

f the

repo

rts in

clud

ed e

vide

nce

from

stu

dy d

esig

ns o

ther

than

RC

Ts; t

hese

repo

rts

exam

ined

pha

rmac

othe

rapy

, med

ical

dev

ises

, sur

gery

, com

plem

enta

ry a

nd a

ltern

ativ

e an

d be

havi

oura

l int

erve

ntio

ns.

Cha

lleng

es in

usi

ng n

on-r

ando

miz

ed s

tudi

es in

sys

tem

atic

revi

ews:

Term

inol

ogy

to d

escr

ibe

diffe

rent

non

- ran

dom

ized

stu

dy d

esig

ns is

inco

nsis

tent

in c

linic

al

rese

arch

lite

ratu

re a

nd E

PC

repo

rts.

Ther

e ar

e no

est

ablis

hed

guid

elin

es a

s to

whe

n no

n-ra

ndom

ized

stu

dies

can

or s

houl

d be

co

nsid

ered

for i

nclu

sion

in s

yste

mat

ic re

view

s, o

r wha

t stu

dy d

esig

ns to

con

side

r.

Itisdifficulttoassessthequalityofnon-randomizedstudies.

Of 4

9 re

ports

that

incl

uded

non

-ran

dom

ized

stu

dy d

esig

ns:

12 (2

5%) d

idn’

t ass

ess

stud

y qu

ality

;•

16%

use

d a

chec

klis

t or s

corin

g sy

stem

;•

10%

ada

pted

a p

revi

ousl

y pu

blis

hed

inst

rum

ent;

the

rem

aind

er (4

9%) u

sed

inst

rum

ents

that

the

revi

ewer

s ha

d de

velo

ped

them

selv

es.

Rec

omm

enda

tions

for s

yste

mat

ic re

view

ers:

Ass

ess

the

avai

labi

lity

of R

CTs

bef

ore

dete

rmin

-•ingfinalinclusioncriteria.

Con

side

r the

pro

s an

d co

ns o

f non

-RC

T de

sign

s.•

Pro

vide

a ra

tiona

le fo

r dec

isio

ns to

incl

ude/

ex-

•cludespecificstudydesigns.

Ass

ess

impo

rtant

dom

ains

of s

tudy

qua

lity.

Dis

cuss

how

incl

udin

g va

rious

stu

dy d

esig

ns m

ay

•af

fect

con

clus

ions

.

Dis

cuss

how

the

qual

ity o

f stu

dies

and

the

body

of e

vide

nce

may

affe

ct c

oncl

usio

ns.

Rec

omm

enda

tions

for r

esea

rche

rs:

Min

imiz

e so

urce

s of

bia

s re

gard

less

of s

tudy

desi

gn.

Exa

min

e th

e ef

fect

s of

var

ious

sou

rces

of b

ias

on

•m

easu

red

outc

omes

.

Use

con

sist

ent t

erm

inol

ogy

for s

tudy

des

ign.

Dev

ise

com

preh

ensi

ve s

earc

h st

rate

gies

for n

on-

•ra

ndom

ized

stu

dy d

esig

ns.

Jack

son

& W

ater

s

2005

Aus

tralia

MET

HO

DS

(SR

s)

To p

rovi

de re

com

men

-da

tions

to re

view

ers

on th

e is

sues

to

addr

ess

with

in a

pub

lic

heal

th s

yste

mat

ic

revi

ew a

nd to

indi

rect

ly

prov

ide

advi

ce to

re

sear

cher

s on

the

repo

rting

requ

irem

ents

of

prim

ary

stud

ies

for

the

prod

uctio

n of

hig

h qu

ality

sys

tem

atic

re

view

s.

To e

nsur

e re

view

s m

eet t

he n

eeds

of u

sers

, est

ablis

h an

adv

isor

y gr

oup

(mem

bers

sho

uld

be

fam

iliar

with

topi

c ar

ea, u

sefu

l to

incl

ude

pers

pect

ives

of p

olic

y m

aker

s, fu

nder

s, p

ract

ition

ers,

po

tent

ial u

sers

).

Issu

es to

add

ress

whe

n re

view

ing

publ

ic h

ealth

inte

rven

tions

:

incl

usio

n of

stu

dy d

esig

ns•

sear

ch fo

r pub

lic h

ealth

lite

ratu

re•

qual

ity a

sses

smen

t•

theo

retic

al fr

amew

orks

for i

nter

vent

ion

inte

grity

of i

nter

vent

ions

hete

roge

neity

inte

grat

ion

of q

ualit

ativ

e an

d qu

antit

ativ

e st

udie

s•

ethi

cs a

nd in

equa

litie

s –

heal

th in

terv

entio

ns e

ffect

ive

in re

duci

ng in

equa

litie

s an

d im

prov

-•

ing

heal

th o

f mar

gina

lized

/ dis

adva

ntag

ed

Rec

omm

enda

tions

:

Use

the

Qua

lity

Ass

essm

ent T

ool f

or Q

uant

itativ

e •

Stu

dies

dev

elop

ed b

y Th

omas

et a

l., 2

004.

49

Stud

yPu

rpos

eM

etho

ds/R

esul

tsC

omm

ents

/Issu

esA

tkin

s &

Di-

Gui

sepp

i

1998

US

A

MET

HO

DS

(res

earc

h on

pr

even

tive

heal

th

care

)

Dra

win

g on

exp

erie

nc-

es fr

om th

e U

S P

re-

vent

ive

Ser

vice

s Ta

sk

Forc

e, to

out

line

som

e m

ajor

are

as w

here

re

sear

ch is

nee

ded

to

definetheappropri-

ateuseofspecific

prev

entiv

e se

rvic

es

(e.g

., sc

reen

ing

test

s,

coun

selli

ng in

terv

en-

tions

, im

mun

izat

ions

, ch

emop

roph

ylax

is).

Obs

erva

tiona

l stu

dies

:

help

to e

stab

lish

linka

ges

in c

ausa

l pat

hway

;•

help

to u

nder

stan

d na

tura

l his

tory

of d

isea

se a

nd id

entif

y ris

k fa

ctor

s, m

easu

ring

com

pli-

•an

ce w

ith a

nd a

dver

se e

ffect

s of

trea

tmen

ts;

help

to d

eter

min

e ac

cura

cy o

f dia

gnos

tic te

sts;

helptoassessefficacyofinterventions;

have

an

inhe

rent

bia

s.•

Rec

omm

enda

tions

bas

ed s

olel

y on

obs

erva

tiona

l evi

-de

nce

requ

ire h

igh-

qual

ity s

tudi

es s

how

ing

cons

iste

nt

and

pref

erab

ly la

rge

effe

cts.

Whe

re o

nly

a fe

w o

bser

vatio

nal s

tudi

es o

f ade

quat

e qu

ality

are

ava

ilabl

e or

effe

ct s

izes

are

mod

est o

r in-

cons

iste

nt, a

dditi

onal

stu

dies

in d

iffer

ent p

opul

atio

ns

wou

ld b

e a

valu

able

add

ition

to th

e lit

erat

ure.

Har

den,

et a

l.

2004

UK

MET

HO

DS

(vie

ws

stud

ies)

To d

escr

ibe

the

met

hods

dev

elop

ed

for r

evie

win

g re

sear

ch

on p

eopl

e’s

pers

pec-

tives

and

exp

erie

nces

(‘‘

view

s’’ s

tudi

es)

alon

gsid

e tri

als,

with

in

a se

ries

of re

view

s on

yo

ung

peop

le’s

men

tal

heal

th, p

hysi

cal a

ctiv

-ity

and

hea

lthy

eatin

g.

Two

type

s of

stu

dies

wer

e re

view

ed:

1. In

terv

entio

n st

udie

s –

to id

entif

y ef

fect

ive,

inef

fect

ive

and

harm

ful i

nter

vent

ions

.

2. N

on-in

terv

entio

n st

udie

s –

to d

escr

ibe

fact

ors

asso

ciat

ed w

ith m

enta

l hea

lth, p

hysi

cal a

ctiv

-ity

and

hea

lthy

eatin

g.

“Vie

ws”

stu

dies

:

35 m

et th

e in

clus

ion

crite

ria.

Thestudiesvariedinthemethodsused—

manycouldnoteasilybeclassifiedas‘‘qualitative’’or

‘‘qua

ntita

tive.

’’

Mos

t fai

led

to m

eet s

even

bas

ic m

etho

dolo

gica

l rep

ortin

g st

anda

rds

used

in a

new

ly d

evel

oped

qu

ality

ass

essm

ent t

ool.

Thebenefitsofbringingtogetherviewstudiesinasystematicwayinclude:

gain

ing

a gr

eate

r bre

adth

of p

ersp

ectiv

es a

nd a

dee

per u

nder

stan

ding

of p

ublic

hea

lth is

-•

sues

from

the

poin

t of v

iew

of t

hose

targ

eted

by

inte

rven

tions

;

helpingreflectonstudymethodsthatmaydistort,misrepresentorfailtopickuppeople’s

•vi

ews;

crea

ting

grea

ter o

ppor

tuni

ties

for p

eopl

e’s

own

pers

pect

ives

and

exp

erie

nces

to in

form

polic

ies

to p

rom

ote

thei

r hea

lth.

Em

ergi

ng fr

amew

ork

repo

rted

in s

ubse

quen

t pub

lica-

tion

(Oliv

er, e

t al.,

200

5)

50

Stud

yPu

rpos

eM

etho

ds/R

esul

tsC

omm

ents

/Issu

esW

ong

& R

aabe

1996

US

A

MET

HO

DS

(met

a-re

view

)

To p

rovi

de s

ome

basi

c gu

idel

ines

for

non-

epid

emio

logi

sts

to e

valu

ate

met

a-an

al-

yses

in o

ccup

atio

nal

coho

rt st

udie

s.

Som

e pr

oble

ms

and

limita

tions

of t

radi

tiona

l qua

litat

ive

revi

ews:

Sel

ectio

n of

stu

dies

was

usu

ally

not

sub

ject

to p

re-d

eter

min

ed c

riter

ia.

Wei

ghts

ass

igne

d to

indi

vidu

al s

tudi

es w

ere

usua

lly s

ubje

ctiv

e.•

No

quan

titat

ive

sum

mar

ies

of ri

sk e

stim

ates

wer

e do

ne.

Find

ings

:

Rat

her t

han

repl

acin

g qu

alita

tive

revi

ews,

qua

ntita

tive

met

a-an

alys

is s

houl

d be

mad

e pa

rt •

of th

e ov

eral

l ass

essm

ent.

The

term

“met

a-re

view

” is

prop

osed

to e

mph

asiz

e th

e im

porta

nce

of b

oth

qual

itativ

e an

d •

quan

titat

ive

com

pone

nts

in a

com

preh

ensi

ve re

view

pro

cess

, mea

ning

it in

clud

es a

met

a-an

alys

is.

Bas

ic s

teps

in c

ondu

ctin

g a

met

a-re

view

:

1.Definetheresearchquestion.

2. C

ondu

ct a

lite

ratu

re s

earc

h.

3. D

eter

min

e cr

iteria

for i

nclu

sion

.

4. C

ondu

ct a

trad

ition

al q

ualit

ativ

e re

view

.

5. C

ondu

ct a

qua

ntita

tive

met

a-an

alys

is.

6. In

tegr

ate

tradi

tiona

l qua

litat

ive

revi

ew w

ith q

uant

itativ

e m

eta-

anal

ysis

.

7. A

pply

crit

eria

for c

ausa

tion

in in

terp

reta

tion.

Benefitsofm

eta-review

:

usef

ul in

sel

ectin

g st

udie

s, in

org

aniz

ing,

pre

sent

-•

ing

and

sum

mar

izin

g m

etho

ds/ r

esul

ts fr

om

indi

vidu

al s

tudi

es;

can

also

be

used

to d

etec

t het

erog

enei

ty a

mon

g •

stud

ies.

Majorbenefitsofconductingameta-analysisinclude:

enha

ncin

g st

atis

tical

pow

er (e

spec

ially

whe

n •

orig

inal

stu

dies

are

sm

all);

prov

idin

g a

prop

erly

wei

ghte

d su

mm

ary

risk

•es

timat

e;

taki

ng c

onsi

sten

cy o

f stu

dy a

nd m

etho

ds/re

sults

into

con

side

ratio

n;

min

imiz

ing

the

prob

lem

of m

ultip

le c

ompa

rison

s;•

exam

inin

g da

ta fo

r het

erog

enei

ty.

San

delo

wsk

i, et

al

.

2007

US

A

MET

HO

DS

(met

a-su

mm

ary)

To a

ddre

ss th

e ch

al-

leng

e of

man

agin

g th

e di

ffere

nces

pre

sum

ed

to e

xist

bet

wee

n qu

ali-

tativ

e an

d qu

antit

ativ

e re

sear

ch, a

nd a

dvan

ce

the

use

of q

ualit

ativ

e m

eta-

sum

mar

y as

a

tech

niqu

e us

eful

for

synt

hesi

zing

qua

lita-

tive

and

quan

titat

ive

descriptivefindings.

42 re

ports

(35

jour

nal a

rticl

es, s

ix u

npub

lishe

d th

eses

or d

isse

rtatio

ns a

nd o

ne te

chni

cal r

epor

t) w

ere

incl

uded

.

Of t

he 4

2 re

ports

, 12

wer

e qu

alita

tive

stud

ies,

thre

e w

ere

inte

rven

tion

stud

ies,

one

was

a m

ixed

m

etho

ds s

tudy

and

26

wer

e va

rietie

s of

qua

ntita

tive

obse

rvat

iona

l stu

dies

.

Feat

ures

of q

ualit

ativ

e m

eta-

sum

mar

ies

incl

ude:

a qu

antit

ativ

ely-

orie

nted

agg

rega

tion

appr

oach

to re

sear

ch s

ynth

esis

;•

theextraction,groupingandformattingoffindings,andthecalculationoffrequencyand

•in

tens

ity e

ffect

siz

es;

able

to s

ynth

esiz

e m

etho

ds/re

sults

of q

ualit

ativ

e an

d qu

antit

ativ

e su

rvey

s of

resp

onse

s •

obta

ined

from

sim

ilar d

ata

colle

ctio

n an

d an

alys

is p

roce

dure

s;

abletoconductaposteriorianalysesoftherelationshipbetweenreportsandfindings.

In c

ontra

st w

ith m

ost s

yste

mat

ic re

view

met

hods

, thi

s st

udy

is b

iase

d to

war

d in

clus

ion

rath

er th

an e

xclu

sion

of

stu

dies

.

51

Stud

yPu

rpos

eM

etho

ds/R

esul

tsC

omm

ents

/Issu

esJe

ffers

on &

Dem

i-ch

elli

1999

UK

STU

DIE

S

(exp

erim

enta

l vs.

no

n-ex

perim

enta

l)

To e

xam

ine

the

rela

-tio

n be

twee

n ex

peri-

men

tal a

nd n

on-e

xper

-im

enta

l stu

dy d

esig

ns

in v

acci

nolo

gy.

The

auth

ors

desc

ribed

exp

erim

enta

l and

non

-exp

erim

enta

l stu

dies

and

thei

r app

roac

hes.

They

ass

esse

d ea

ch s

tudy

des

ign’

s ca

pabi

lity

of te

stin

g fo

ur a

spec

ts o

f vac

cine

per

form

ance

, na

mel

y im

mun

ogen

icity

, dur

atio

n of

imm

unity

con

ferr

ed, i

ncid

ence

and

ser

ious

ness

of s

ide

ef-

fect

s an

d nu

mbe

r of i

nfec

tions

pre

vent

ed b

y va

ccin

atio

n.

Non

-exp

erim

enta

l des

igns

sho

uld

be a

pplie

d w

hen:

an e

xper

imen

t is

impo

ssib

le b

ecau

se th

e ai

m o

f •

eval

uatio

n is

to a

sses

s va

ccin

e ef

fect

iven

ess;

an e

xper

imen

t is

unne

cess

ary

(e.g

., S

mal

lpox

vacc

inat

ion

eval

uatio

n);

an e

xper

imen

t is

inap

prop

riate

(tria

l pop

ulat

ion

•no

t lar

ge e

noug

h to

det

ect e

vent

/out

com

e, re

luc-

tanc

e/re

fusa

l to

parti

cipa

te, e

thic

al/le

gal/p

oliti

cal

obst

acle

s);

individualefficacyismeasuredintermsofan

•in

frequ

ent a

dver

se e

vent

;

inte

rven

tions

pre

vent

rare

eve

nts;

the

popu

latio

n ef

fect

iven

ess

of th

e va

ccin

e is

to

•be

mea

sure

d in

term

s of

long

-term

, rar

e an

d se

ri-ou

s co

nseq

uenc

es o

f dis

ease

.

Rot

stei

n &

Lau

-pa

cis

2004

Can

ada

STU

DIE

S

(HTA

s vs

. SR

s)

To e

luci

date

the

impo

rtant

diff

eren

ces

betw

een

heal

th te

ch-

nolo

gy a

sses

smen

ts

(HTA

s) a

nd s

yste

mat

ic

revi

ews

(SR

).

The

rese

arch

ers

inte

rvie

wed

17

auth

ors

or u

sers

of H

TA.

TheyidentifiedsevenareasofdifferencesbetweenHTA

sandSRs:

1. m

etho

dolo

gica

l sta

ndar

ds (H

TAs

may

incl

ude

liter

atur

e of

rela

tivel

y po

or m

etho

dolo

gica

l qu

ality

if a

topi

c is

of i

mpo

rtanc

e to

dec

isio

n-m

aker

s)

2. re

plic

atio

n of

pre

viou

s st

udie

s (r

elat

ivel

y co

mm

on fo

r HTA

s bu

t not

sys

tem

atic

revi

ews)

3. c

hoic

e of

topi

cs (m

ore

polic

y-or

ient

ed fo

r HTA

s, w

hile

sys

tem

atic

revi

ews

tend

to b

e dr

iven

by

rese

arch

er in

tere

st)

4. in

clus

ion

of c

onte

nt e

xper

ts a

nd p

olic

y-m

aker

s as

aut

hors

(pol

icy-

mak

ers

mor

e lik

ely

to b

e includedinHTA

s,althoughtherearepotentialconflictsofinterest)

5. in

clus

ion

of e

cono

mic

eva

luat

ions

(mor

e of

ten

with

HTA

s, a

lthou

gh e

cono

mic

eva

luat

ions

ba

sed

upon

poo

r clin

ical

dat

a m

ay n

ot b

e us

eful

)

6. m

akin

g po

licy

reco

mm

enda

tions

(mor

e lik

ely

with

HTA

s, a

lthou

gh th

is m

ust b

e do

ne w

ith

caut

ion)

7. d

isse

min

atio

n of

the

repo

rt (m

ore

ofte

n ac

tivel

y do

ne fo

r HTA

s)

HTA

s ar

e no

t sol

ely

conc

erne

d w

ith th

e ev

alua

tion

of

scientificevidence.

52

53

Reference List

Scottish Intercollegiate Guidelines Network 2001–2005. (2001). SIGN 50: A guideline developers’ handbook. Retrieved February 1, 2009 from http://www.sign.ac.uk/guidelines/fulltext/50/in-dex.html.

Critical Appraisal Skills Programme (CASP) and evidence-based practice. (2005). Critical Appraisal Skills Programme: Making sense of evidence - CASP appraisal tools. Oxford, Public Health Resource Unit, Milton Keynes Primary Care NHS Trust. Retrieved February 1, 2009 from http://www.phru.nhs.uk/Pages/PHD/resources.htm.

Atkins, D., Briss, P. A., Eccles, M., Flottorp, S., Guyatt, G. H., Harbour, R. T., et al. (2005). Systems for grading the quality of evidence and the strength of recommendations II: Pilot study of a new system. BMC Health Services Research, 5, 25.

Atkins, D., & DiGuiseppi, C. G. (1998). Broadening the evidence base for evidence-based guidelines. A research agenda based on the work of the U.S. Preventive Services Task Force. American Journal of Preventive Medicine, 14, 335-344.

Benson, K., & Hartz, A. J. (2000). A comparison of observational studies and randomized, controlled trials. The New England Journal of Medicine, 342, 1878-1886.

Britton, A., McKee, M., Black, N., McPherson, K., Sanderson, C., & Bain, C. (1998). Choosing be-tween randomised and non-randomised studies: A systematic review. Health Technology Assessment, 2, 1-124.

Chou, R., & Helfand, M. (2005). Challenges in systematic reviews that assess treatment harms. An-nals of Internal Medicine, 142, 1090-1099.

Conn, V. S., & Rantz, M. J. (2003). Focus on research methods. Research methods: Managing pri-mary study quality in meta-analysis. Research in Nursing & Health, 26, 322-333.

Dalziel,K.,Round,A.,Stein,K.,Garside,R.,Castelnuovo,E.,&Payne,L.(2005).Dothefindingsofcaseseriesstudiesvarysignificantlyaccordingtomethodologicalcharacteristics?Health Technology Assessment, 9, iii-iv, 1.

Deeks, J. J., Dinnes, J., D’Amico, R., Sowden, A. J., Sakarovitch, C., Song, F., et al. (2003). Evaluat-ing non-randomised intervention studies. Health Technology Assessment, 7, 1-173.

DeWalt, D. A., Berkman, N. D., Sheridan, S., Lohr, K. N., & Pignone, M. P. (2004). Literacy and health outcomes: A systematic review of the literature. Journal of General Internal Medicine, 19, 1228-1239.

DiCenso,A.,Prevost,S.,Benefield,L.,Bingle,J.,Ciliska,D.,Driever,M.,etal.(2004).Evidence-based nursing: Rationale and resources. Worldviews on Evidence-Based Nursing, 1, 69-75.

Downs, S. H., & Black, N. (1998). The feasibility of creating a checklist for the assessment of the methodological quality both of randomised and non-randomised studies of health care inter-ventions. Journal of Epidemiology and Community Health, 52, 377-384.

54

Goldsmith, M. R., Bankhead, C. R., & Austoker, J. (2007). Synthesising quantitative and qualitative research in evidence-based patient information. Journal of Epidemiology and Community Health, 61, 262-270.

Greenhalgh, T., Robert, G., Macfarlane, F., Bate, P., Kyriakidou, O., & Peacock, R. (2005). Storylines of research in diffusion of innovation: A meta-narrative approach to systematic review. Social Science & Medicine, 61, 417-430.

Greer, N., Mosser, G., Logan, G., & Halaas, G. W. (2000). A practical approach to evidence grading. Joint Commission Journal on Quality Improvement, 26, 700-712.

Harden, A., Garcia, J., Oliver, S., Rees, R., Shepherd, J., Brunton, G., et al. (2004). Applying system-atic review methods to studies of people’s views: An example from public health research. Journal of Epidemiology and Community Health, 58, 794-800.

Jackson, N., & Waters, E. (2005). Criteria for the systematic review of health promotion and public health interventions. Health Promotion International, 20, 367-374.

Jefferson, T., & Demicheli, V. (1999). Relation between experimental and non-experimental study de-signs. HB vaccines: A case study. Journal of Epidemiology and Community Health, 53, 51-54.

Katrak, P., Bialocerkowski, A. E., Massy-Westropp, N., Kumar, S., & Grimmer, K. A. (2004). A sys-tematic review of the content of critical appraisal tools. BMC Medical Research Methodology, 4:22.

Khan, K., ter Riet, G., Popay, J., Nixon, J., & Kleijnen, J. (2001). Stage II: Conducting the review. Phase 5: Study quality assessment. In: K. S. Khan, G. ter Riet, J. Glanville, et al. Undertak-ing systematic reviews of research on effectiveness. CRD’s guidance for carrying out or com-missioning reviews 2nd Edition (pp. 1-20). York: NHS Centre for Reviews and Dissemination, University of York.

Lemmer, B., Grellier, R., & Steven, J. (1999). Systematic review of non-random and qualitative re-search literature: Exploring and uncovering an evidence base for health visiting and decision making. Qualitative Health Research, 9, 315-328.

Lethaby, A., Wells, S., & Furness, S. (2001). Handbook for the preparation of explicit evidence-based clinical practice guidelines. Aukland, New Zealand: New Zealand Guidelines Group, Effective Practice Institute of the University of Auckland.

Linde, K., Scholz, M., Melchart, D., & Willich, S. N. (2002). Should systematic reviews include non-randomizedanduncontrolledstudies?Thecaseofacupunctureforchronicheadache.Jour-nal of Clinical Epidemiology, 55, 77-85.

MacLehose, R. R., Reeves, B. C., Harvey, I. M., Sheldon, T. A., Russell, I. T., & Black, A. M. (2000). A systematic review of comparisons of effect sizes derived from randomised and non-ran-domised studies. Health Technology Assessment, 4, 1-154.

Margetts, B. M., Thompson, R. L., Key, T., Duffy, S., Nelson, M., Bingham, S., et al. (1995). Develop-mentofascoringsystemtojudgethescientificqualityofinformationfromcase-controlandcohort studies of nutrition and disease. Nutrition and Cancer, 24, 231-239.

55

McIntosh, H. M., Woolacott, N. F., & Bagnall, A. M. (2004). Assessing harmful effects in systematic reviews. BMC Medical Research Methodology, 4, 19.

Norris, S., & Atkins, D. (2005). Challenges in using nonrandomized studies in systematic reviews of treatment interventions. Annals of Internal Medicine, 142, 1112-1119.

Ogilvie, D., Egan, M., Hamilton, V., & Petticrew, M. (2005). Systematic reviews of health effects of socialinterventions:2.Bestavailableevidence:Howlowshouldyougo?Journal of Epidemi-ology and Community Health, 59, 886-892.

Oliver, S., Harden, A., Rees, R., Shepherd, J., Brunton, G., Garcia, J., et al. (2005). An emerging framework for including different types of evidence in systematic reviews for public policy. Evaluation, 11, 428-446.

Ramsay, C. R., Matowe, L., Grilli, R., Grimshaw, J. M., & Thomas, R. E. (2003). Interrupted time series designs in health technology assessment: Lessons from two systematic reviews of behavior change strategies. International Journal of Technology Assessment in Health Care, 19, 613-623.

Rangel, S. J., Kelsey, J., Colby, C. E., Anderson, J., & Moss, R. L. (2003). Development of a quality assessment scale for retrospective clinical studies in pediatric surgery. Journal of Pediatric Surgery, 38, 390-396.

Rotstein, D., & Laupacis, A. (2004). Differences between systematic reviews and health technology assessments:Atrade-offbetweentheidealsofscientificrigorandtherealitiesofpolicymak-ing. International Journal of Technology Assessment in Health Care, 20, 177-183.

Sandelowski, M., Barroso, J., & Voils, C. I. (2007). Using qualitative metasummary to synthesize qualitativeandquantitativedescriptivefindings.Research in Nursing & Health, 30, 99-111.

Slim, K., Nini, E., Forestier, D., Kwiatkowski, F., Panis, Y., & Chipponi, J. (2003). Methodological index for non-randomized studies (minors): Development and validation of a new instrument. ANZ Journal of Surgery, 73, 712-716.

Spencer, L., Ritchie, J., & Lewis, J. (2004). Quality in qualitative evidence: A framework for assessing research evidence.(2nded.)London:GovernmentChiefSocialResearcher’sOffice.

Stein, K., Dalziel, K., Garside, R., Castelnuovo, E., & Round, A. (2005). Association between method-ological characteristics and outcome in health technology assessments which included case series. International Journal of Technology Assessment in Health Care, 21, 277-287.

Steuten, L. M., Vrijhoef, H. J., van-Merode, G. G., Severens, J. L., & Spreeuwenberg, C. (2004). The health technology assessment – Disease management instrument reliably measured method-ologic quality of health technology assessments of disease management. Journal of Clinical Epidemiology, 57, 881-888.

Thomas, B. H., Ciliska, D., Dobbins, M., & Micucci, S. (2004a). A process for systematically reviewing the literature: Providing the research evidence for public health nursing interventions. World-views on Evidence-Based Nursing, 1, 176-184.

56

Thomas, J., Harden, A., Oakley, A., Oliver, S., Sutcliffe, K., Rees, R., et al. (2004b). Integrating quali-tative research with trials in systematic reviews. British Medical Journal, 328, 1010-1012.

Thomson, H., Atkinson, R., Petticrew, M., & Kearns, A. (2006). Do urban regeneration programmes improvepublichealthandreducehealthinequalities?AsynthesisoftheevidencefromUKpolicy and practice (1980–2004). Journal of Epidemiology and Community Health, 60, 108-115.

West, S., King, V., Carey, T. S., Lohr, K. N., McKoy, N., Sutton, S. F., et al. (2002). Systems to rate the strengthofscientificevidence.Evidence Report – Technology Assessment (Summary), 1-11.

Whittemore,R.,&Knafl,K.(2005).Theintegrativereview:Updatedmethodology.Journal of Ad-vanced Nursing, 52, 546-553.

Wong, O., & Raabe, G. K. (1996). Application of meta-analysis in reviewing occupational cohort stud-ies. Occupational and Environmental Medicine, 53, 793-800.

Zaza, S., Wright-De Aguero, L. K., Briss, P. A., Truman, B. I., Hopkins, D. P., Hennessy, M. H., et al. (2000). Data collection instrument and procedure for systematic reviews in the Guide to Com-munity Preventive Services. American Journal of Preventive Medicine, 18, 44-74.