1_2 biostatistics

Embed Size (px)

Citation preview

  • 7/27/2019 1_2 biostatistics

    1/8

    1.2: Biostatistics 1.2.1

    history prospectively over a period o time. The purpose is to

    determine which characteristics, exposures, or risk actors are

    associated with a given outcome. Unlike cross-sectional or case-

    control studies, however, the outcome o interest in a cohortstudy occurs in the uture, ater the subject is enrolled.

    In the cardiovascular literature, one o the most prominent co-

    hort studies is the Framingham study o cardiovascular risk ac-

    tors, which started in 1948, when more than 6,000 individuals

    rom the same Massachusetts town were enrolled. The cohort

    was then ollowed with various examinations every two years to

    determine the association o various risk actors with cardiovas-

    cular diseases.

    Case SeriesA case series is a descriptive account o a collection o patients,

    in which each case shares some characteristic o interest. A caseseries can be the frst step in identiying a new disease process,

    describing a novel physical or imaging fnding, or reporting on

    a novel treatment method. Case series reports can serve as a

    catalyst to other studies.

    Case-Control StudiesCase-control studies are retrospective studies that start with

    individuals who already have a disease or trait o interest (i.e.,

    the cases), then match them with control subjects who lack that

    disease or trait. The studies then attempt to look back at events,

    exposures, and characteristics to see whether any dierence ex-

    ists between the two groups. The idea is to fnd a risk actor that

    is present in the history o the cases, but not the controls.

    Cross-Sectional StudiesCross-sectional studies are descriptive studies about the charac-

    teristics o a group o individuals at a single point in time. These

    studies describe what is happening right now in a group o

    people. Cross-sectional studies can be used to establish norms

    (e.g., or a new biomarker), evaluate the useulness o a new

    diagnostic procedure, or poll individuals about their attitudes

    (e.g., towards health care).

    Introduction

    One o the strengths o the feld o cardiology is its strong

    evidence base. Cardiology is known or its large clinical trials,which provide a large amount o new inormation about treat-

    ments and practices. A well-qualifed cardiologist must under-

    stand biostatistics to help decide whether results presented in

    the literature can be believed and should be applied to their

    treatment o patients.

    The purpose o this module is to provide a basic oundation

    in biostatistics so that the reader can better evaluate clinical

    literature. The ocus is on the interpretation o research meth-

    ods, rather than on calculations and computational details. This

    module emphasizes the biostatistics methods that the cardio-

    vascular specialist is most likely to encounter in modern medical

    literature.

    An additional resource is the American Heart Association Scien-

    tifc Statement that reviews the appropriate statistical evaluation

    o novel markers o cardiovascular risk. It provides an excellent

    summary and explanation o some o the most requently used

    biostatistics within the feld o cardiovascular medicine.1

    Study Designs

    Medical research study designs all into two major categories:

    1) observational and 2) interventional. In observational studies,

    subjects are observed but no medical intervention is perormed.

    The observations may be perormed prospectively (i.e., orward-looking cohort studies), retrospectively (i.e., backward-looking

    case-control studies), or simultaneously (i.e., cross-sectional

    studies). Interventional studies, or clinical trials, evaluate the

    eects o an intervention on outcomes and are considered to

    provide a stronger level o evidence than observational studies.

    Understanding how a study is designed is essential to under-

    standing the conclusions that can be drawn rom it.

    Cohort StudyA cohort study is an observational study that enrolls a group o

    subjects with something in common and ollows their natural

    Chapter 1: General Principles

    1.2: BiostatisticsLori B. Daniels, MD, MAS, FACC

    Consulting Fees/Honoraria: Roche Diagnostics, Alere, Inc.; Research Grants: Roche Diagnostics.

    Learner Objectives

    Upon completion o this module, the reader will be able to:

    1. Correctly identiy the study design used in a given medical study, and list its uses.

    2. Describe the p value and interpret its meaning and relationship to hypothesis testing.

    3. Calculate sensitivity, specifcity, and positive and negative predictive values or a diagnostic test.

    4. Compare various methods to account or conounding variables in clinical studies, including multivariable regression and propen-

    sity analysis.

    5. Recognize how survival analysis diers rom other regression analyses and identiy when survival analysis should be used.

  • 7/27/2019 1_2 biostatistics

    2/8

    1.2.2 Chapter 1: General Principles

    mean, and 99.7% lie within 3 SDs o the mean. Even i the

    distribution is not bell-shaped, at least 75% o the values will

    always all within 2 SDs o the mean.

    The mean and SD are also useul or determining whether a set

    o variables is skewed, when only summary statistics are provided.

    I the mean is smaller than 2 SDs, the data are probably skewed.

    Hypothesis Testing

    The purpose o a hypothesis test is to permit generalizations

    about a population based upon observations made in a sample

    rom that population. When making comparisons between

    two groups (e.g., a group that received some therapy vs. a

    group that received a placebo), the hypothesis being tested is

    that some dierence exists between the two groups. The null

    hypothesis, which must be disproven in order to claim a dier-

    ence, is that the two groups are equal.

    Errors in Hypothesis TestingErroneous conclusions can arise rom hypothesis tests in two

    ways. A type I error is analogous to a alse-positive diagnostic

    test. A type I error incorrectly concludes signifcance (and rejects

    the null hypothesis) when the result is not really signifcant. A

    type II error is analogous to a alse-negative diagnostic test.

    A type II error incorrectly concludes no signifcance when the

    result is, in act, signifcant. The probability o making a type II

    error is known as beta, or .

    The signifcance level o a test is also known as alpha, or . This is

    the probability o making a type I error (i.e., incorrectly concluding

    signifcance). For many statistical tests, the p value can be com-

    pared to the signifcance level to either detect a statistically signif-

    cant dierence (i.e., reject the null hypothesis), or to conclude

    that the null hypothesis cannot be rejected at that signifcance

    level. For most studies, a signifcance level o 0.05 is chosen.

    PowerThe power o a statistical test is its ability to detect signifcance

    when a result is indeed signifcant. In the case o a diagnostic

    test, the power o a statistical test corresponds to the sensitiv-

    ity o a diagnostic test, or the ability to detect a disease that is

    present. Investigators want the statistical test to be sensitive to

    detecting signifcance when it should be detected, and minimiz-

    ing the risk o a type II error. Power can be calculated as 1 ,

    or 1 minus the probability o making a type II error.

    P ValuesThe p value is the probability o obtaining a result at least asextreme as the one observed, ithe null hypothesis is true (i.e.,

    the groups being compared are equal). The p value can also be

    thought o as the probability that the observed result is due to

    chance alone. Ater a statistical test has been perormed, i its p val-

    ue is less than (oten set at 0.05), the null hypothesis is rejected.

    Importantly, a signifcant p value does not provide absolute

    proo that a dierence between groups exists; rather, a p value

    o 0.05 or less means that i the groups do notdier, results as

    extreme as those observed would happen only 1 in 20 times or

    Clinical TrialA clinical trial is a study undertaken to determine whether a

    particular procedure or treatment can improve an outcome or

    a selected group o individuals. In controlled clinical trials, the

    intervention being tested is compared with another procedure

    or drug, generally a placebo or the current standard o care.

    Randomization assigns subjects to either the active treatment or

    the placebo group by chance, thereby eliminating bias in patient

    assignment and allowing patient characteristics to be evenly

    distributed between groups.

    In double-blind studies, neither the study investigator nor the

    subject knows whether they are in the treatment group or the

    control group, thus eliminating potential bias. The most robust

    clinical trial design is considered to be the randomized, double-

    blind, placebo-controlled trial, because it can provide evidence

    o causation (i.e., the best indication that any eects seen are

    due to the intervention).

    Descriptive Statistics

    Measures of Central TendencyThe correct measures to use or describing a population dependon the type o data being analyzed. The mean measures the

    middle o a distribution o numerical variables, i that variable

    has a normal (i.e., bell-shaped) distribution in the population

    being studied. The mean, also called the arithmetic mean, is

    the average o the observations. The mean value is sensitive to

    extreme values, especially in small sample sizes, so it is not used

    or skewed data.

    Instead, the median is used to measure the middle o a distribu-

    tion o numerical variables that are skewed. Medians are also

    used or ordinal data, which are data that have an inherent

    order among categories (e.g., New York Heart Associationclassifcation or heart ailure severity). The median is the point

    at which hal the observations are larger and hal are smaller.

    Unlike the mean, it is unaected by extreme values.

    Measures of VariationRange: The range is the simplest measure o spread and is de-

    fned as the highest observed value minus the lowest observed

    value. One disadvantage o the range is that it tends to increase

    as the number o observations increases, since extreme values

    are more likely to occur with a greater number o data points.

    Consequently, reporting percentile values such as the 25th and

    75th percentiles, or the 5th and 95th percentiles, is oten pre-

    erred. The interquartile range (i.e., the dierence between the75th and 25th percentiles) is oten used in conjunction with the

    median, to describe a set o skewed observations.

    Standard deviation: The most commonly used measure o dis-

    persion is the standard deviation (SD), a measure o the spread

    o data about the mean. The SD is calculated as the square root

    o the variance, and the variance is the average o the squares

    o the deviations rom the mean. I the distribution o observa-

    tions is bell-shaped, then approximately 67% o observations

    are within 1 SD o the mean, 95% are within 2 SDs o the

  • 7/27/2019 1_2 biostatistics

    3/8

    1.2: Biostatistics 1.2.3

    event. The NNT is the reciprocal o the ARR (i.e., NNT = 1 ARR).

    The relative risk reduction (RRR) is also requently presented and

    is the amount o risk reduction relative to the baseline risk. It is

    calculated as ARR divided by the baseline event rate (i.e., divided

    by the incidence in those without the exposure).

    Example: A new antiplatelet agent is being tested or its ability

    to decrease the incidence o myocardial inarction (MI) at 60

    days. One thousand patients are randomized to either the new

    drug or to a placebo, resulting in 500 people in each group.Ater 60 days, 15 patients in the active treatment group and

    25 patients in the placebo group have experienced the primary

    outcome (i.e., MI). What is the NNT with this new medication,

    to prevent one MI? What is the RRR?

    Answer: The incidence o MI in the treatment group was 15

    500 = 0.03. The incidence in the placebo group was 25 500 =

    0.05. Thereore, the ARR = 0.05 0.03 = 0.02, or 2%. The NNT

    = 1 ARR = 1 0.02 = 50. Thereore, 50 patients would need

    to be treated with the new medication or 60 days to prevent

    one MI. The RRR = 0.02 0.05 = 0.40, or 40%.

    Assessing Yield o Diagnostic Tests

    An important part o cardiology is evaluating the accuracy o

    diagnostic tests. Even with advanced diagnostic technology such

    as cardiac CT scans, nuclear stress tests, and electrophysiology

    (EP) studies or diagnosing inducible arrhythmias, the possibil-

    ity o alse positive and alse-negative test results exists. The

    accuracy o a diagnostic test depends on both its sensitivity and

    its specifcity.

    SensitivitySensitivity is the probability o a positive test result in patients

    who have the condition. It is calculated as: true positives (true

    positives + alse negatives) (Table 1). Tests with higher sensitivity

    mean lower chances o missing the disease. A very sensitive test,

    when negative, rules out disease. A helpul mnemonic or this is

    SNOUT: SeNsitive test = good or rule OUT.

    SpecificityThe specifcity o a test is the tests ability to identiy individuals

    who do nothave disease. More precisely, specifcity is the prob-

    ability o a negative test result in a patient who does not have

    the condition being measured. Specifcity is calculated as: true

    negatives (true negatives + alse positives) (Table 1). Tests with

    higher specifcity mean that ewer normal people are misdiag-

    nosed as having the disease. A very specifc test, when positive,rules in disease. A helpul mnemonic or this is SPIN: SPecifc

    test = rule IN.

    Positive and Negative Predictive ValuesPerormance o a diagnostic test can also be assessed by the

    positive predictive value (PPV) and the negative predictive value

    (NPV). The PPV is the probability that a patient whose test is

    positive actually has the disease. It is calculated as: true positives

    (true positives + alse positives). The NPV is the probability that

    a patient whose test is negative does not have the disease. It is

    less. Similarly, ailure to detect a signifcant dierence does not

    mean that a dierence does not exist.

    The p value has oten been subject to misinterpretation. The p

    value is not the probability that the null hypothesis is true. It also

    does not indicate the size or importance o the observed eect.

    Even an eect that is highly statistically signifcant (e.g., p