Upload
amie-dawson
View
237
Download
0
Tags:
Embed Size (px)
Citation preview
Statistics for Non-Statisticians
Kay M. Larholt, Sc.D.
Vice President, Biometrics & Clinical Operations
Abt Bio-Pharma Solutions
2
Topics
1) Basic Statistical Concepts 2) Study Design
3) Blinding and Randomization
4) Hypothesis testing
5) Power and Sample Size
3
Basic Statistical Concepts
4
Statistics
Per the American Heritage dictionary - “The mathematics of the collection, organization,
and interpretation of numerical data, especially the analysis of population characteristics by inference from sampling.”
• Two broad areas Descriptive – Science of summarizing data Inferential – Science of interpreting data in order
to make estimates, hypothesis testing, predictions, or decisions from the sample to target population.
5
Introduction to Clinical Statistics
• Statistics - The science of making decisions in the face of uncertainty
• Probability - The mathematics of uncertainty – The probability of an event is a measure of how
likely the event is to happen
6
Sample versus Population
7
Clinical Statistics
• Biostatisticians are statisticians who apply statistics to the biological sciences.
• Clinical statistics are statistics that are applied to clinical trials
8
Basic Statistical Concepts
• Types of data• Descriptive statistics• Graphs• Basic probability concepts• Type of probability distributions in clinical
statistics• Sample vs. population
9
Types of Data
Qualitative Quantitative
Gender – Male/Female Age – in years
Eye Color – Blue, Brown, Other
Number of children in family
Race Height in inches
Diabetic Yes/No Annual Salary
WBC count
10
Types of Quantitative Variables
Discrete ContinuousDiscrete variables: can only assume certain values and there are usually “gaps” between values.Example: the number of children in a family (1,2,3,...)
Continuous variables: can assume any value within a specific range.Example: The time it takes to fly from Boston to New York, price of a house.
11
Continuous Data
Data should be collected in its “rawest” form. We can always categorize data later. (We can never “uncategorize” data.)– e.g. If you measure prostate size as part of the
clinical trial then capture the size in mm on the CRF.
Patient Size (mm) 1 24
2 45
3 264 235 67
Patient Categories 1 Between 21 and 40
2 Between 41 and 60
3 Between 21 and 404 Between 21 and 405 Between 61 and 80
We can categorize into:0-20 mm21-40 mm etc. later
12
Basic Data Summarization Techniques
• The objective of data summarization is to describe the characteristics of a data set. Ultimately, we want to make the data set more comprehensible and meaningful.
• To put data in a concise form, use Summary descriptive statistics Graphs Tables
13
Descriptive Statistics for Continuous Variables
Measures of central tendency Mean, Median, Mode
Measures of dispersion Range, Variance, Standard deviation Measures of relative standing Lower quartile (Q1) Upper quartile (Q3)
Interquartile range (IQR)
: range (IQR)
14
Mean
Arithmetic average: sum of all observations divided by # of observations.
Example: The average age of a group of 10 people
is 24.2 years
Who are they?
N
XX
15
Mean
Answer:
• They could be ten “twenty-somethings” who go out to dinner together: Pete aged 24, Jane aged 26, Louise aged 21, Bob aged 22, Julie aged
23, Sue aged 22, Jenn aged 27, John aged 28, Jeff aged 20 and Mark aged 29.
• The mean age for these 10 people is: (24+26+21+22+23+22+27+28+20+29)/10
= 24.2 years
16
Mean
Or alternatively:• They could be Mr. & Mrs. Smith and their 8
grandchildren: Susie aged 3, Abby aged 5, Max aged 8, Laura aged 10, Joshua aged
10, Emma aged 12, Jane aged 13, Sarah aged 18, Mrs. Smith aged 80, Mr. Smith aged 83.
The mean age for these 10 people is:
(3+5+8+10+10+12+13+18+80+83)/10=
= 24.2 years
17
Mean
• Presenting the average alone does not give you much information about the data you are looking at.
18
Median
• The midpoint of the values after they have been ordered from the smallest to the largest, or the largest to the smallest.
• There are as many values above the median as below it in the data array.
19
Median
Example The age of the people in our data set is:
24, 26, 21, 23, 22, 27, 28, 20, 29 ( I took out one of the 22 year olds to make this example easier)
Arranging the data in ascending order gives:
20, 21, 22, 23, 24, 26, 27, 28, 29 The median is 24
20
This well-known saying is part of a phrase attributed to Benjamin Disraeli and popularized in the U.S. by Mark Twain
There are three kinds of lies: lies, damned lies, and statistics.
21
Median Home Price
Connecticut: Darien• Median home price: $1,295,000• Location: about 40 miles northeast of
midtown Manhattan• Population: 20,209, households 6,592
22
Properties of Mean and Median
• There are unique means and medians for each variable in the data set.
• Median is not affected by extremely large or small values and is therefore a valuable measure of central tendency when such values occur.
• Mean is a poor measure of central tendency in skewed distributions.
23
Mode
• The value of the observation that appears most frequently.
Example The exam scores for ten students are: 81, 93, 84, 75, 68, 87, 81, 75, 81, 87.
Since the score of 81 occurs the most, the modal score is 81.
3-14
24
Averages and What Else?
• As we have seen, just knowing the mean or even the median of a data set does not tell us enough about the data. We need more information to really describe the data.
25
Measures of Dispersion
• Once we know something about the centre of the data we need to understand how the data are dispersed around this centre.
• How variable are the data?
26
Range
Maximum value in the data set minus Minimum value in the data set
1. The age of the patients in our data set is: 21, 25, 19, 20, 22 Range = 25 – 19 = 62. The age of the patients in our data set is: 21, 45, 19, 20, 22. Range = 45 – 19 = 26 When max and min are unusual values, range may be a
misleading measure of dispersion. The range only uses the 2 extreme values in the data.
27
Variance and Standard Deviation
The variance of a data set measures how far each data point is from the mean of the data set.
It provides a measure of how spread out the data points are
The Standard Deviation is the square root of the variance
28
Variance and Standard Deviation
Variance: Measure of dispersion, the square of the deviations of the data from the mean
Standard deviation: positive square root of the variance
Small std dev: observations are clustered tightly around the mean
Large std dev: observations are scattered widely about the mean
29
Standard Deviation
s xi x2
n 1Take each observation and subtract it from the mean of theobservationsSquare the answerSum up all the resultsDivide by n-1Take the square root
30
1.The age of the patients in our data set is: 21, 25, 19, 20, 22
Mean = 21.4, Median = 21, StdDev = 2.302
2. The age of the patients in our data set is:21, 45, 19, 20, 22.
Mean = 25.4, Median = 21, StdDev = 11.014
Example – Standard Deviation
19 20 2221 25
19 20 2221 45
31
Choosing an Appropriate Method of Central Tendency
The mean is ordinarily the preferred measure of central tendency. The mean should always be presented along with the variance or the standard deviation
There are situations when a median might be more appropriate: - a skewed distribution - a small number of subjects
32
Measures of Relative Standing
• Descriptive measures that locate the relative position of an observation in relation to the other observations.
33
Measures of Relative Standing
• The pth percentile is a number such that p% of the observations of the data set fall below and (100-p)% of the observations fall above it.
Lower quartile = 25th percentile (Q1) Mid-quartile = 50th percentile (median or Q2) Upper quartile = 75th percentile (Q3) Interquartile range (IQR = Q3-Q1)
34
The age of the patients in our data set is: 21, 25, 19, 20, 22
Q1 = 20, Q2 = 21, Q3 = 22, IQR = 2
The age of the patients in our data set is: 21, 45, 19, 20, 22
Q1 = 20, Q2 = 21, Q3 = 22, IQR = 2
Measures of Relative Standing… an Example
19 20 2221 25
19 20 2221 45
35
Definitions
• Statistics - The science of making decisions in the face of uncertainty
• Probability - The mathematics of uncertainty – The probability of an event is a measure of
how likely the event is to happen
36
Basic Probability Concepts
Sample spaces and events
Simple probability
Joint probability
37
Sample Spaces
• Collection of all possible outcomesExample: All six faces of a die
Example: All 52 cards in a deck
38
Sample Space
Gumballs in a gumball machine
60 red
50 green
40 yellow
30 white
25 pink
20 blue
16 purple
Total: 241 gumballs
39
Events
Simple event Outcome from a sample space with one characteristic
Examples: A red card from a deck of cards
A purple gumball from the gumball machine
Joint event Involves two outcomes simultaneously
Example: An ace that is also red from a deck of cards
40
Events
Mutually exclusive events Two events cannot occur together
Example: Drawing one card from a deckA: Drawing a queen of diamondsB: Drawing a queen of clubs
As only one of these can happenEvents A and B are mutually exclusive
41
Probability
• Probability is the numerical measure of the likelihood that an event will occur
• Value is between 0 and 1
Certain
Impossible
.5
1
0
42
The probability of an event E:
Assumes each of the outcomes in the sample space is equally likely to occur
Computing Probabilities
P( E ) =Number of event outcomes
Total number of possible outcomes in the sample space
43
Computing Probabilities
Example:
What is the probability of rolling a 4 when you roll a die?
# of possible outcomes in the sample space = 6
# of 4s in the sample space = 1
Prob (rolling a 4 when you roll a die) = 1/6
44
Computing Probabilities
Example:
What is the probability of rolling a six and a four when you roll 2 dice?
# of possible outcomes in the sample space = 36
# of ways to roll one 6 and one 4 = 2
P( ) = 2/36 = .0555
45
Computing Joint Probability
The probability of a joint event, A and B:
( and ) = ( )
number of outcomes from both A and B
total number of possible outcomes in sample space
P A B P A B
46
Computing Joint Probability
P (Red Card and an Ace)
= 2 Red AcesTotal # Cards
= 2/52 = 1/26
47
Type of Probability Distributions in Clinical Statistics
Bernoulli
Binomial
Normal
48
Bernoulli Distribution
The bernoulli distribution is the “coin flip” distribution.
X is bernoulli if its probability function is:
ppw
ppwX
1..0
..1
Examples: X=1 for heads in coin toss X=1 for male in survey X=1 for defective in a test of product
49
Binomial Distribution
• The binomial distribution is just n independent bernoullis added up.
• It is the number of “successes” in n trials.• Probability of success is usually denoted by p,
and therefore probability of failure is 1-p.
Example: Number of heads when we flip a coin 10 times. Here n = 10, p=0.5 (the probability of getting a head when we toss the coin once).
50
Binomial Distribution
• The binomial probability function
xnxX pp
xnx
nxP
1
!!
!
Example: X = Number of heads when we flip a coin 10 times. Here X ~ Binomial (n = 10, p=0.5)
n! = n factorial = n.n-1.n-2…..1
10!=10.9.8.7.6.5.4.3.2.1=3,628,800
51
Binomial Distribution
• Expectation
• Variance
npXE
X = Number of heads when we flip a coin 10 times. Here X ~ Binomial (n = 10, p=0.5).
Then E(X)=5 (on average we expect to get 5 heads) and Var(X) = 2.5.
)1()( pnpXV
52
Gaussian or Normal Distribution aka “Bell Curve”
• Most important probability distribution in the statistical analysis of experimental data.
• Data from many different types of processes follow a “normal” distribution:– Heights of American women– Returns from a diversified asset portfolio
• Even when the data do not follow a normal distribution, the normal distribution provides a good approximation
53
Gaussian or Normal Distribution aka “Bell Curve”
The Normal Distribution is specified by two parameters– The mean, – The standard deviation,
54
Standard Normal Distribution
=1
55
Characteristics of the Standard Normal Distribution
• Mean µ of 0 and standard deviation σ of 1.• It is symmetric about 0 (the mean, median
and the mode are the same).• The total area under the curve is equal to
one. One half of the total area under the curve is on either side of zero.
56
Area in the Tails of Distribution
• The total area under the curve that is more than 1.96 units away from zero is equal to 5%. Because the curve is symmetrical, there is 2.5% in each tail.
57
Normal Distribution
• 68% of observations lie within ± 1 std dev of mean
• 95% of observations lie within ± 2 std dev of mean
• 99% of observations lie within ± 3 std dev of mean
58
Study Design
59
Sample versus Population
• A population is a whole, and a sample is a fraction of the whole.
• A population is a collection of all the elements we are studying and about which we are trying to draw conclusions.
• A sample is a collection of some, but not all, of the elements of the population
60
Sample versus Population
61
Sample versus Population
• To make generalizations from a sample, it needs to be representative of the larger population from which it is taken.
• In the ideal scientific world, the individuals for the sample would be randomly selected. This requires that each member of the population has an equal chance of being selected each time a selection is made.
62
Type of Studies and Study Design
• Phase I – IV
• Controlled vs. non-controlled studies
• Single arm, parallel groups, cross-over designs, and stratified designs
• Selecting an appropriate study design
• Analysis population: Intent-to-treat vs. per-protocol
63
Phases of Clinical Trials
• Clinical trials are generally categorized into four phases.
• An investigational medicine or product may be evaluated in two or more phases simultaneously in different trials, and some trials may overlap two different phases.
64
Phase 1 Studies – Safety and Dosing
• Initial safety trials in which investigators attempt to establish the dose range tolerated by 20-80 healthy volunteers.
• Although usually conducted on healthy volunteers, Phase 1 trials are sometimes conducted with severely ill patients, for example those with cancer or AIDS.
65
Phase 2 Studies – Safety and Limited Efficacy
• Pilot clinical trials to evaluate safety and efficacy in selected populations of about 100-300 patients who have the disease or condition to be treated, diagnosed, or prevented. Often referred to as feasibility studies
• Used as dose finding studies as different doses and regimens are investigated
66
Phase 3 studies - efficacy
• Large “definitive” studies that are carried out once safety has been established and doses that are likely to be effective have been found
• Often called “pivotal” studies
• FDA usually requires 2 Phase III studies for registration
67
Phase 4 studies – post marketing surveillance
• After the product is marketed, Phase 4 studies provide additional details about the product’s safety and efficacy.
• May be used to evaluate formulations, dosages, durations of the treatment, medicine interactions, and other factors.
• Patients from various demographic groups may be studied.
68
Phase 4 studies – post marketing surveillance
• Important part of many Phase 4 studies: detecting and defining previously unknown or inadequately quantified adverse reactions and related risk factors.
• Phase 4 studies are often observational studies rather than experimental.
69
Hierarchy of medical evidence
• From weakest to strongest evidence –• Case reports• Case series• Database studies• Observational studies• Controlled clinical trials• Randomized controlled trial
Byar, 1978
70
Clarke MJ Ovarian Oblation in breast cancer, 1896 to 1998: milestones along hierarchy of evidence from case report to Cochrane review BMJ 1998; 317
71
Controlled studies
• Studies in which a test article is compared with a treatment that has known effects.
• The control group may receive no treatment, standard treatment or placebo.
72
What is a randomized clinical trial?
• A prospective study in humans• Randomization• Comparable control group• Complete accounting of all cases• Carefully monitored for safety and efficacy• Adheres to regulatory requirements;
GCP,FDA, ICH guidelines
73
Blinded studies
• Blinded study: one in which subject or the investigator (or both) are unaware of what trial product a subject is receiving.– Single-blind study: subjects do not know what
treatment they are receiving (active or control) – Double-blind study: neither the subjects nor the
investigators know what treatment a subject is receiving
74
Analysis Populations
75
Intent-to-Treat Principle
• Primary analysis in most randomized clinical trials testing new therapies or devices.
• Requires that any comparison among treatment groups in a randomized clinical trials is based on the results for all subjects in the treatment group to which they were randomly assigned.
• Full analysis: includes compliers and non-compliers
76
Intent-to-Treat
ITT Population includes the following:
All Randomized patients: Preserve initial randomization
- Prevents biased comparison
- Basis for statistical tests and inference
77
Intent-to-Treat
Problems: Predictable or Unpredictable
• Ineligible Patients allowed in the trial• Non-compliance, ie. not following the assigned treatment• Patients refusing a trial procedure• Prohibited medication• Early withdrawal/termination• Invalid data
78
Intent-to-Treat
FDA guideline related to regulatory submission states
‘As a general rule, even if the sponsor’s preferred analysis is based on a reduced subset of the patients with data, there should be an additional “intent-to-treat” analysis using all randomized patients.’
Ref: ICH E3: Structure and Content of Clinical Study Reports
79
Intent-to-Treat
When can we exclude randomized patients?
• Failure to satisfy major entry criteria
• Failure to take at least one dose of medication
• Failure to complete procedure
• Lack of any data post-randomization
• Lost to follow up
• Missing data randomly, not related to treatment assignment
80
Intent-to-Treat
Problem: In a 6-Month study, what should be done with the patient who drops out and provides no further data after 2 months ?
81
Intent-to-Treat
Last Observation Carried Forward (LOCF)
Use last available valid observation post-baseline on a particular variable for the missing visit through the end of study
82
LOCF – last observation carried forward
Time
Baseline Week 1 Week 2 Week 4 Week 8 Week 12
Y D
ata
8
10
12
14
16
18
20
22
24
26
83
Last Observation Carried Forward (LOCF)
Biased if the early withdrawal is treatment related
84
Example
The primary analysis sample will be based on the principle of intention-to-treat. All patients who sign the written Informed Consent form, meet the study entry criteria, and undergo randomization will be included in the analysis, regardless of whether or not the assigned treatment device was implanted.
85
Intent-to-Treat Principle
• Using the complete analysis data set:– Preserves the randomization at the time of analysis
which helps prevent bias – Provides the foundation for statistical testing.– Provides estimates of treatment effects which are
more likely to mirror those observed in clinical practice.
86
Argument against ITT
• An ITT, by including subjects, randomized to the drug but who received little or no drug will dilute the treatment effect when compared to the placebo group
87
How can we improve the ITT analysis?
•Careful identification of inclusion/exclusion criteria
• Careful review of reasons for failure, missing data, and exclusions
• Adherence to Good Clinical Practices
• Better monitoring practices to reduce the protocol deviations and non compliance
• Appropriate and detailed statistical plan and analysis
88
Per-Protocol aka Evaluable patient population
Subset of ITT who are compliant with the protocol and excluding patients who:
• Major protocol violation/deviation
• Use prohibited medication as per protocol
• Technical or procedural failure
• Lost to follow up, lack of efficacy/response
• Wrong treatment assignment
89
Per-Protocol Population
Advantages and disadvantages:
• Analysis in its pure form, completely as per the protocol
• Maximize the efficacy from new treatment
•Not a conservative approach, results in bias due to exclusion
90
Per-Protocol Population
Advantages and disadvantages:
•May not have enough power and sample size
• Both analyses are done in confirmatory trials
•If the results and conclusions are the same from two analyses, the confidence is higher.
91
Blinding and Randomization
92
Randomisation
93
History
• The concept of randomisation was introduced by R.A. Fisher in 1926 in the area of agricultural research.
• Previous to that clinical trials in the 18th and 19th centuries had used controls from the literature, other historical controls and concurrent controls.
94
Randomisation
• To guard against any use of judgement or systematic arrangements i.e to avoid bias
• To provide a basis for the standard methods of statistical analysis such as significance tests
• Assures that treatment groups are balanced (on average) in all regards.• i.e. balance occurs for known prognostic
variables and for unknown or unrecorded variables
95
• Inferential statistics calculated from a clinical trial make an allowance for differences between patients and that this allowance will be correct on average if randomisation has been employed.
96
• Randomisation promotes confidence that we have acted in utmost good faith. It is not to be used as an excuse for ignoring the distribution of known prognostic factors.
• Randomisation is essential for the effective blinding of a clinical trial.
97
Non-Randomised Trials
• It is difficult to obtain a reliable assessment of treatment effect from non-randomised studies.
98
Uncontrolled Trials
• Medical Practice implies that a doctor prescribes a treatment for a patient that in his/her judgement, based on past experience, offers the best prognosis.
• Clinicians are always looking for new therapies, improvements in therapies and alternative therapies.
99
• When a new treatment is proposed some clinicians might try it on a few patients in an uncontrolled trial.
• The new treatment is studied without any direct comparison with a similar group of patients on more standard therapy.
100
• Uncontrolled trials have the potential to provide a very distorted view of therapy.
• Why?
101
Laetrile• In the 1970s in the US Laetrile achieved
widespread popular support for treating advanced cancer of all types without any formal testing in clinical trials.
• NCI tried to collect documented cases of tumour response after Laetrile therapy. Although an estimated 70,000 cancer patients had tried Laetrile only 93 cases were submitted for evaluation and 6 were judged to have a response.
102
Laetrile
• An uncontrolled trial of 178 patients found no benefit and evidence of cyanide toxicity
• The final conclusion of NCI was that “Laetrile is a toxic drug that is not effective as a cancer treatment”
103
• Uncontrolled trials are much more likely to lead to enthusiastic recommendation of the treatment as compared with properly controlled trials.
104
Historical Controls
• Instead of randomising groups studies compare the current patients on the new treatment with previous patients who had received the standard treatment.
• This is a Historical Control group.
105
• Major flaw: - How can we be sure that the comparison is fair. How do we know whether the 2 groups differ with respect to any feature other than the treatment itself.
106
Patient Selection
• Historical control group is less likely to have clearly defined criteria for patient inclusion because the patients on the standard treatment were not known to be in the clinical trial when their treatment began.
• Historical controls were recruited earlier and possibly from a different source and therefore might be a different type of patients.
• Investigator might be more restrictive in choice of patients for new treatment
107
Concurrent Non-randomised Controls
• Use some pre-determined systematic method or investigator judgement to assign patients to groups
108
Non-Randomised controls
• Date of Birth – odd/even day of birth = new/standard treatment
• Date of presentation – odd/even days = new/standard treatment
• Alternate assignment – odd/even patients= new/standard treatment
109
Example
• Trial of anticoagulant therapy for MI• Patients admitted on odd days of the
month received anticoagulant and patients admitted on even days did not.
Treated Control
N 589 442
110
• Is it ethical to randomise?– Assuming we have sufficient supply of the
new treatment why shouldn’t every new patient be given the new treatment?
111
• Tendency is to do non-randomised trial first and then follow up with RCT.
• However it is difficult to do the RCT if the results from the non-randomised trial are too good.
112
• We assume that the new treatment has a
reasonable chance of being an improvement.
• Before agreeing to enter patients into a randomised trial the investigator must be prepared to stay objective about the treatments involved.
• Randomised trials often produce scientific evidence that contradicts prior beliefs.
113
Equipoise
• What is “equipoise” and why is it important?– A state of being equally balanced;
• Clinical equipoise provides the ethical basis for medical research involving randomly assigning patients to different treatment arms.
114
Clinical Equipoise
Term was first used by B. Freedman in 1987, in the article 'Equipoise and the ethics of clinical research‘ NEJM 1987 317(3) .
“The ethics of clinical research requires equipoise - a state of genuine uncertainty on the part of the clinical investigator regarding the comparative therapeutic merits of each arm in a trial. Should the investigator discover that one treatment is of superior therapeutic merit, he or she is ethically obliged to offer that treatment. “
115
Clinical Equipoise
Freeman suggests that as long as there is genuine uncertainty within the expert medical communityabout the preferred treatment then there can be clinical equipoise, even if a specific investigator has a preference.
116
Randomisation
117
Randomisation
• Randomised trial with two treatments, A or B
• How do we assign treatments:• Toss a coin each time: Heads = A, Tails = B• Random Numbers Table• Random Permuted Blocks
118
Flip a coin
• Could flip coin for each participant—called complete randomisation or simple randomisation
• Problem: can get imbalance in groups, especially in smaller trials– Imbalance in prognostic factors more likely– Inefficient for estimating treatment effect
119
Probability of 5 Treated and 5 Controls in 10 patients
• What is the probability of getting 5 Treated patients out of 10?
• Remember the binomial distribution
120
Binomial Distribution
• The binomial probability function
xnxX pp
xnx
nxP
1
!!
!
X ~ Binomial (n = 10, p=0.5)
In this case, we want x=5
121
Imbalance with 10 Participants
(#T, #C) Probability Efficiency
(5,5) .246 1
(4,6) or (6,4) .410 .96
(3,7) or (7,3) .234 .84
(2,8) or (8,2) .088 .64
(1,9) or (9,1) .020 .36
(0,10) or (10,0) .002 0
122
• Even if treatment balanced at end of trial, may be unbalanced at some time
• E.g., may be balanced at end with 400 participants, but first 10 might be
CCCCTCTCTC
123
Random Permuted Blocks
• To balance over time, could randomize in blocks (called random permuted blocks)
• Conceptually, for blocks of size 4: put 2 T labels & 2 C labels in hat: for next 4 participants, draw labels at random without replacement from hat
• TTCC TCTC TCCT CTTC CTCT CCTT all equally likely
124
Forces balance after every 4
TCTC CCTT C T C T
1 2 3 4 5 6 7 8 9 10 11 12
T TC C
T TC C
T TC C
125
Randomisation by blocks – 5 sites, 6 patients per site
Patients/Sites
1 2 3 4 5 6
1 A A B A B B
2 B A A A B B
3 B B B A A A
4 A B A B A B
5 A A B B B A
126
Incomplete Blocks
• What happens if a site does not enroll all the patients in a block?
• What happens if multiple sites do not enroll all the patients in a block?
127
• The smaller the block size, the more often balance is forced: e.g., in trial of 100,– blocks of size 2 force balance after every
2– A block of size 100 forces balance only
at end
128
• With blocks of size 2 in an unblinded trial,
we know every second participant’s assignment in advance
• I can veto potential participants until I find one I like (sick one if next assignment is control, healthy one if next patient is treatment)
• Schulz KF Subverting Randomization in Controlled Trials, JAMA 1995 Vol. 274
129
• Even with larger blocks, in unblinded trial you know some assignments in advance
• With blocks of size 8 if first 6 are TCTTCT, we know next 2 are C
• Using a variable block size in a study makes it harder to guess
• Never include the block size in a protocol
130
Subgroup balance
• Sometimes want to balance treatment assignments within subgroups
• Especially important if subgroup size is small
• E.g., with 6 diabetics in a trial, with a complete randomisation, there is 22% chance of 5-1 or 6-0 split!
131
Stratified Randomisation
• To avoid this problem could stratify the randomisation (use blocked randomisation separately for factors such as diabetics & nondiabetics)
• E.g., for blocks of size 6,
Diabetics Nondiabetics
CTTCCT TTCTCC TCCTTC…
132
Stratified Block randomisation
• Typical examples of such factors are age group, severity of condition, and treatment centre. Stratification simply means having separate block randomisation schemes for each combination of characteristics (‘stratum’)
133
Stratified Block randomisation
• For example, in a study where you
expect treatment effect to differ with age and sex you may have four strata:
male over 65,
male under 65,
female over 65
female under 65
134
Stratification
• If we believe that gender is a prognostic factor, that is, the treatment effect for males may be different than the treatment effect for females then we should stratify the randomisation (and the analysis) on gender
• This does not mean that we need identical numbers of males and females in the trial, but rather that the males be equally distributed between treatment and control and the females also be equally distributed between treatment and control
135
Stratification
• Example:• In RA trials there are usually about 70% females
and 30% males. • Stratification at randomisation would help ensure
that each treatment group had about 70% females and 30% males.
• If we believe that males and females may have different responses to treatment this would be important.
136
Blinding
137
Blinding
• Many potential problems can be avoided if everyone involved in the study is blinded to the actual treatment the patient is receiving.
• Blinding (also called masking or concealment of treatment) is intended to avoid bias caused by subjective judgment in reporting, evaluation, data processing, and analysis due to knowledge of treatment.
138
Hierarchy of Blinding
• open label: no blinding
• single blind: patient blinded to treatment
• double blind: patient and assessors blinded to treatment
• complete blind: everyone involved in the study blinded to treatment
139
Open Label Studies
These may be useful for
• pilot studies
• dose ranging studies
However knowledge of treatment can lead to:
• over or under reporting of toxicity
• over estimation of efficacy
Even a small fraction of patients assigned at random to placebo will reduce these potential problems substantially.
140
Single Blind Studies
• Usually justified when it is practically infeasible to blind the investigator
• Patients should be blinded if the endpoints are patient reported outcomes and for safety
• Where possible use blinded assessor to elicit adverse events or patient outcomes
141
Double Blind Studies
• When both the subjects and the investigators are kept from knowing who is assigned to which treatment, the experiment is called “double blind"
• Serve as a standard by which all studies are judged, since it minimizes both potential patient biases and potential assessor biases
142
Double Blinding:Techniques
• Coded treatment groups• Sham treatments• If impossible – try to use a blinded
assessor for assessing endpoints.
143
Double Blind Studies: issues Side effects:• Side effects (observable by patient or
assessor) are much harder to blind and are one of the major ways in which blinding is broken
Efficacy:• A truly effective treatment can be recognized
by its efficacy in patients
144
Hypothesis Testing
145
Hypothesis Testing
• Steps in hypothesis testing: state problem, define endpoint, formulating hypothesis, - choice of statistical test, decision rule, calculation, decision, and interpretation
• Statistical significance: types of errors, p-value, one-tail vs. two-tail tests, confidence intervals
• Significance vs. non-significance
• Equivalence vs. superiority tests
146
Descriptive and inferential statistics
• Descriptive statistics is devoted to the summarization and description of data (population or sample) .
• Inferential statistics uses sample data to make an inference about a population .
147
Objectives and Hypotheses
• Objectives are questions that the trial was designed to answer
• Hypotheses are more specific than objectives and are amenable to explicit statistical evaluation
148
Examples of Objectives
• To determine the efficacy and safety of Product ABC in diabetic patients
• To evaluate the efficacy of Product DEF in the prevention of disease XYZ
• To demonstrate that images acquired with product GHI are comparable to images acquired with product JKL for the diagnosis of cancer
149
How do you measure the objectives?
• Endpoints need to be defined in order to measure the objectives of a study.
150
Endpoints: Examples:
• Primary Effectiveness Endpoint –
– Percentage of patients requiring intervention due to pain, where an intervention is defined as :
1. Change in pain medication
2. Early device removal
151
Endpoints: Examples:
• Primary Endpoint:
Percentage of patients with a reduction in pain:
– Reduction in the Brief Pain Inventory (BPI) worst pain scores of ≥ 2 points at 4 weeks over baseline.
152
Endpoints: Examples
• Patient Survival– Proportion of patients surviving two years post-
treatment– Average length of survival of patients post-
treatment
153
Objectives and Hypotheses
• Primary outcome measure
– greatest importance in the study
– used for sample size
– More than one primary outcome measure - multiplicity issues
154
Hypothesis Testing
• Null Hypothesis (H0)– Status Quo– Usually Hypothesis of no difference– Hypothesis to be questioned/disproved
• Alternate Hypothesis (HA)– Ultimate goal– Usually Hypothesis of difference– Hypothesis of interest
155
Hypothesis Testing
If Ho is
True False
Decision Fail to reject
No Error Type II Error (β)
Reject Type I Error (α) No Error
Type I Error – Society’s Risk
Type II Error – Sponsor’s Risk
156
Hypothesis testing
• Null Hypothesis – No difference between Treatment and Control
• Type I error aka alpha, , p-value– The probability of declaring a difference
between treatment and control groups even though one does not exist (ie treatment is not statistically different from control in this experiment)
– As this is “society’s risk” it is conventionally set at 0.05 (5%)
157
Hypothesis testing
• Type II error aka beta, – The probability of not declaring a difference
between treatment and control groups even though one does exist (ie treatment is statistically different from control in this experiment)
– 1 - is the power of the study• Often set at 0.8 (80% power) however many
companies use 0.9• Underpowered studies have less probability of
showing a difference if one exists
158
Steps in Hypothesis Testing
1. Choose the null hypothesis (H0) that is to be tested
2. Choose an alternative hypothesis (HA) that is of interest
3. Select a test statistic, define the rejection region for decision making about when to reject H0
4. Draw a random sample by conducting a clinical trial
159
Steps in Hypothesis Testing
5. Calculate the test statistic and its corresponding p-value
6. Make conclusion according to the pre-determined rule specified in step 3
160
Hypothesis Testing – Normal Distribution
161
Test of Significance and p-value
• Statistically significant:– Conclusion that the results of a study are
not likely to be due to chance alone. – Clinical significance is unrelated to
statistical significance
162
Test of Significance and p-value
p-value– Probability that the observed relationship (e.g.,
between variables) or a difference (e.g., between means) in a sample occurred by pure chance and that in the population from which the sample was drawn, no such relationship or differences exist.
– It is not the probability that given result is wrong.
163
Test of Significance and p-value
p-value– The smaller the p-value, the more likely that the
observed relation between variables in the sample is a reliable indicator of the relation between the respective variables in the population.
164
Test of Significance and p-value
The p-level of .05 (i.e.,1/20) indicates that there is a 5% probability that the relation between the variables found in our sample is “by chance alone“.
In other words, assuming that in the population there was no relation between those variables whatsoever, and we were repeating experiments like ours one after another, we could expect that approximately in every 20 replications of the experiment there would be one in which the relation between the variables in question would be equal or stronger than in ours.
165
Sample versus population
166
Estimation
• We use results from our sample to make inference about the population– How reliable are the sample data at
representing the population data? – Is the sample mean a good estimation of the
population mean?
167
Confidence Intervals
• The results of the analysis are estimates of the “truth” in the population.
• The “average reduction in pain score” is an estimate based on the sample in the study.
Confidence Intervals indicate the precision of the estimate. The wider the confidence interval, the less precise the estimate
168
Confidence Intervals
Example:• Average reduction in pain score from baseline to month
6 was 9.7 (95% Confidence Interval: 8.3 to 11.1)
• This does not mean that we are 95% sure that the “true” result lies between 8.3 and 11.1, rather if we were to repeat the study 100 times with the same sample size and characteristics, 95 of the studies would probably show a mean reduction in pain score between 8.3 and 11.1
169
What have we learnt?
• Statistics doesn’t have to be frightening.• Statistics is all about a way of thinking• If you don’t have uncertainty you don’t need
statistics• p-values are probability statements that tell you
something about your experiment
170
What haven’t we learnt?
• All the detailed theory and formulae that back up everything we have discussed
• How to be a statistician (for that you do have to go to graduate school)
• How to get the perfect answer each time we run a clinical trial:– We are working with patients not widgets and human
beings are incredibly complex
171
References
• ICH Guidelines E9, E3 and others• Statistical Issues in Drug Development – Stephen
Senn 1997 John Wiley & Sons• Freeman B. Equipoise and the ethics of clinical
research NEJM 1987 317(3) • Schulz KF. Subverting Randomization in
Controlled Trials, JAMA 1995 Vol. 274