Statistics for Non-Statisticians Kay M. Larholt, Sc.D. Vice President, Biometrics & Clinical Operations Abt Bio-Pharma Solutions

Statistics for Non-Statisticians

Kay M. Larholt, Sc.D.

Vice President, Biometrics & Clinical Operations

Abt Bio-Pharma Solutions

2

Topics

1) Basic Statistical Concepts 2) Study Design

3) Blinding and Randomization

4) Hypothesis testing

5) Power and Sample Size

3

Basic Statistical Concepts

4

Statistics

Per the American Heritage dictionary - “The mathematics of the collection, organization,

and interpretation of numerical data, especially the analysis of population characteristics by inference from sampling.”

• Two broad areas Descriptive – Science of summarizing data Inferential – Science of interpreting data in order

to make estimates, hypothesis testing, predictions, or decisions from the sample to target population.

5

Introduction to Clinical Statistics

• Statistics - The science of making decisions in the face of uncertainty

• Probability - The mathematics of uncertainty – The probability of an event is a measure of how

likely the event is to happen

6

Sample versus Population

7

Clinical Statistics

• Biostatisticians are statisticians who apply statistics to the biological sciences.

• Clinical statistics are statistics that are applied to clinical trials

8

Basic Statistical Concepts

• Types of data• Descriptive statistics• Graphs• Basic probability concepts• Type of probability distributions in clinical

statistics• Sample vs. population

9

Types of Data

Qualitative Quantitative

Gender – Male/Female Age – in years

Eye Color – Blue, Brown, Other

Number of children in family

Race Height in inches

Diabetic Yes/No Annual Salary

WBC count

10

Types of Quantitative Variables

Discrete ContinuousDiscrete variables: can only assume certain values and there are usually “gaps” between values.Example: the number of children in a family (1,2,3,...)

Continuous variables: can assume any value within a specific range.Example: The time it takes to fly from Boston to New York, price of a house.

11

Continuous Data

Data should be collected in its “rawest” form. We can always categorize data later. (We can never “uncategorize” data.)– e.g. If you measure prostate size as part of the

clinical trial then capture the size in mm on the CRF.

Patient Size (mm) 1 24

2 45

3 264 235 67

Patient Categories 1 Between 21 and 40

2 Between 41 and 60

3 Between 21 and 404 Between 21 and 405 Between 61 and 80

We can categorize into:0-20 mm21-40 mm etc. later

12

Basic Data Summarization Techniques

• The objective of data summarization is to describe the characteristics of a data set. Ultimately, we want to make the data set more comprehensible and meaningful.

• To put data in a concise form, use Summary descriptive statistics Graphs Tables

13

Descriptive Statistics for Continuous Variables

Measures of central tendency Mean, Median, Mode

Measures of dispersion Range, Variance, Standard deviation Measures of relative standing Lower quartile (Q1) Upper quartile (Q3)

Interquartile range (IQR)

: range (IQR)

14

Mean

Arithmetic average: sum of all observations divided by # of observations.

Example: The average age of a group of 10 people

is 24.2 years

Who are they?

N

XX

15

Mean

Answer:

• They could be ten “twenty-somethings” who go out to dinner together: Pete aged 24, Jane aged 26, Louise aged 21, Bob aged 22, Julie aged

23, Sue aged 22, Jenn aged 27, John aged 28, Jeff aged 20 and Mark aged 29.

• The mean age for these 10 people is: (24+26+21+22+23+22+27+28+20+29)/10

= 24.2 years

16

Mean

Or alternatively:• They could be Mr. & Mrs. Smith and their 8

grandchildren: Susie aged 3, Abby aged 5, Max aged 8, Laura aged 10, Joshua aged

10, Emma aged 12, Jane aged 13, Sarah aged 18, Mrs. Smith aged 80, Mr. Smith aged 83.

The mean age for these 10 people is:

(3+5+8+10+10+12+13+18+80+83)/10=

= 24.2 years

17

Mean

• Presenting the average alone does not give you much information about the data you are looking at.

18

Median

• The midpoint of the values after they have been ordered from the smallest to the largest, or the largest to the smallest.

• There are as many values above the median as below it in the data array.

19

Median

Example The age of the people in our data set is:

24, 26, 21, 23, 22, 27, 28, 20, 29 ( I took out one of the 22 year olds to make this example easier)

Arranging the data in ascending order gives:

20, 21, 22, 23, 24, 26, 27, 28, 29 The median is 24

20

This well-known saying is part of a phrase attributed to Benjamin Disraeli and popularized in the U.S. by Mark Twain

There are three kinds of lies: lies, damned lies, and statistics.

21

Median Home Price

Connecticut: Darien• Median home price: $1,295,000• Location: about 40 miles northeast of

midtown Manhattan• Population: 20,209, households 6,592

22

Properties of Mean and Median

• There are unique means and medians for each variable in the data set.

• Median is not affected by extremely large or small values and is therefore a valuable measure of central tendency when such values occur.

• Mean is a poor measure of central tendency in skewed distributions.

23

Mode

• The value of the observation that appears most frequently.

Example The exam scores for ten students are: 81, 93, 84, 75, 68, 87, 81, 75, 81, 87.

Since the score of 81 occurs the most, the modal score is 81.

3-14

24

Averages and What Else?

• As we have seen, just knowing the mean or even the median of a data set does not tell us enough about the data. We need more information to really describe the data.

25

Measures of Dispersion

• Once we know something about the centre of the data we need to understand how the data are dispersed around this centre.

• How variable are the data?

26

Range

Maximum value in the data set minus Minimum value in the data set

1. The age of the patients in our data set is: 21, 25, 19, 20, 22 Range = 25 – 19 = 62. The age of the patients in our data set is: 21, 45, 19, 20, 22. Range = 45 – 19 = 26 When max and min are unusual values, range may be a

misleading measure of dispersion. The range only uses the 2 extreme values in the data.

27

Variance and Standard Deviation

The variance of a data set measures how far each data point is from the mean of the data set.

It provides a measure of how spread out the data points are

The Standard Deviation is the square root of the variance

28

Variance and Standard Deviation

Variance: Measure of dispersion, the square of the deviations of the data from the mean

Standard deviation: positive square root of the variance

Small std dev: observations are clustered tightly around the mean

Large std dev: observations are scattered widely about the mean

29

Standard Deviation

s xi x2

n 1Take each observation and subtract it from the mean of theobservationsSquare the answerSum up all the resultsDivide by n-1Take the square root

30

1.The age of the patients in our data set is: 21, 25, 19, 20, 22

Mean = 21.4, Median = 21, StdDev = 2.302

2. The age of the patients in our data set is:21, 45, 19, 20, 22.

Mean = 25.4, Median = 21, StdDev = 11.014

Example – Standard Deviation

19 20 2221 25

19 20 2221 45

31

Choosing an Appropriate Method of Central Tendency

The mean is ordinarily the preferred measure of central tendency. The mean should always be presented along with the variance or the standard deviation

There are situations when a median might be more appropriate: - a skewed distribution - a small number of subjects

32

Measures of Relative Standing

• Descriptive measures that locate the relative position of an observation in relation to the other observations.

33

Measures of Relative Standing

• The pth percentile is a number such that p% of the observations of the data set fall below and (100-p)% of the observations fall above it.

Lower quartile = 25th percentile (Q1) Mid-quartile = 50th percentile (median or Q2) Upper quartile = 75th percentile (Q3) Interquartile range (IQR = Q3-Q1)

34

The age of the patients in our data set is: 21, 25, 19, 20, 22

Q1 = 20, Q2 = 21, Q3 = 22, IQR = 2

The age of the patients in our data set is: 21, 45, 19, 20, 22

Q1 = 20, Q2 = 21, Q3 = 22, IQR = 2

Measures of Relative Standing… an Example

19 20 2221 25

19 20 2221 45

35

Definitions

• Statistics - The science of making decisions in the face of uncertainty

• Probability - The mathematics of uncertainty – The probability of an event is a measure of

how likely the event is to happen

36

Basic Probability Concepts

Sample spaces and events

Simple probability

Joint probability

37

Sample Spaces

• Collection of all possible outcomesExample: All six faces of a die

Example: All 52 cards in a deck

38

Sample Space

Gumballs in a gumball machine

60 red

50 green

40 yellow

30 white

25 pink

20 blue

16 purple

Total: 241 gumballs

39

Events

Simple event Outcome from a sample space with one characteristic

Examples: A red card from a deck of cards

A purple gumball from the gumball machine

Joint event Involves two outcomes simultaneously

Example: An ace that is also red from a deck of cards

40

Events

Mutually exclusive events Two events cannot occur together

Example: Drawing one card from a deckA: Drawing a queen of diamondsB: Drawing a queen of clubs

As only one of these can happenEvents A and B are mutually exclusive

41

Probability

• Probability is the numerical measure of the likelihood that an event will occur

• Value is between 0 and 1

Certain

Impossible

.5

1

0

42

The probability of an event E:

Assumes each of the outcomes in the sample space is equally likely to occur

Computing Probabilities

P( E ) =Number of event outcomes

Total number of possible outcomes in the sample space

43


Example:

What is the probability of rolling a 4 when you roll a die?

# of possible outcomes in the sample space = 6

# of 4s in the sample space = 1

Prob (rolling a 4 when you roll a die) = 1/6

44


Example:

What is the probability of rolling a six and a four when you roll 2 dice?

# of possible outcomes in the sample space = 36

# of ways to roll one 6 and one 4 = 2

P( ) = 2/36 = .0555

45

Computing Joint Probability

The probability of a joint event, A and B:

( and ) = ( )

number of outcomes from both A and B

total number of possible outcomes in sample space

P A B P A B

46

Computing Joint Probability

P (Red Card and an Ace)

= 2 Red AcesTotal # Cards

= 2/52 = 1/26

47

Type of Probability Distributions in Clinical Statistics

Bernoulli

Binomial

Normal

48

Bernoulli Distribution

The bernoulli distribution is the “coin flip” distribution.

X is bernoulli if its probability function is:

ppw

ppwX

1..0

..1

Examples: X=1 for heads in coin toss X=1 for male in survey X=1 for defective in a test of product

49

Binomial Distribution

• The binomial distribution is just n independent bernoullis added up.

• It is the number of “successes” in n trials.• Probability of success is usually denoted by p,

and therefore probability of failure is 1-p.

Example: Number of heads when we flip a coin 10 times. Here n = 10, p=0.5 (the probability of getting a head when we toss the coin once).

50


• The binomial probability function

xnxX pp

xnx

nxP

1

!!

!

Example: X = Number of heads when we flip a coin 10 times. Here X ~ Binomial (n = 10, p=0.5)

n! = n factorial = n.n-1.n-2…..1

10!=10.9.8.7.6.5.4.3.2.1=3,628,800

51


• Expectation

• Variance

npXE

X = Number of heads when we flip a coin 10 times. Here X ~ Binomial (n = 10, p=0.5).

Then E(X)=5 (on average we expect to get 5 heads) and Var(X) = 2.5.

)1()( pnpXV

52

Gaussian or Normal Distribution aka “Bell Curve”

• Most important probability distribution in the statistical analysis of experimental data.

• Data from many different types of processes follow a “normal” distribution:– Heights of American women– Returns from a diversified asset portfolio

• Even when the data do not follow a normal distribution, the normal distribution provides a good approximation

53

Gaussian or Normal Distribution aka “Bell Curve”

The Normal Distribution is specified by two parameters– The mean, – The standard deviation,

54

Standard Normal Distribution

=1

55

Characteristics of the Standard Normal Distribution

• Mean µ of 0 and standard deviation σ of 1.• It is symmetric about 0 (the mean, median

and the mode are the same).• The total area under the curve is equal to

one. One half of the total area under the curve is on either side of zero.

56

Area in the Tails of Distribution

• The total area under the curve that is more than 1.96 units away from zero is equal to 5%. Because the curve is symmetrical, there is 2.5% in each tail.

57

Normal Distribution

• 68% of observations lie within ± 1 std dev of mean



58

Study Design

59


• A population is a whole, and a sample is a fraction of the whole.

• A population is a collection of all the elements we are studying and about which we are trying to draw conclusions.

• A sample is a collection of some, but not all, of the elements of the population

60


61


• To make generalizations from a sample, it needs to be representative of the larger population from which it is taken.

• In the ideal scientific world, the individuals for the sample would be randomly selected. This requires that each member of the population has an equal chance of being selected each time a selection is made.

62

Type of Studies and Study Design

• Phase I – IV

• Controlled vs. non-controlled studies

• Single arm, parallel groups, cross-over designs, and stratified designs

• Selecting an appropriate study design

• Analysis population: Intent-to-treat vs. per-protocol

63

Phases of Clinical Trials

• Clinical trials are generally categorized into four phases.

• An investigational medicine or product may be evaluated in two or more phases simultaneously in different trials, and some trials may overlap two different phases.

64

Phase 1 Studies – Safety and Dosing

• Initial safety trials in which investigators attempt to establish the dose range tolerated by 20-80 healthy volunteers.

• Although usually conducted on healthy volunteers, Phase 1 trials are sometimes conducted with severely ill patients, for example those with cancer or AIDS.

65

Phase 2 Studies – Safety and Limited Efficacy

• Pilot clinical trials to evaluate safety and efficacy in selected populations of about 100-300 patients who have the disease or condition to be treated, diagnosed, or prevented. Often referred to as feasibility studies

• Used as dose finding studies as different doses and regimens are investigated

66

Phase 3 studies - efficacy

• Large “definitive” studies that are carried out once safety has been established and doses that are likely to be effective have been found

• Often called “pivotal” studies

• FDA usually requires 2 Phase III studies for registration

67

Phase 4 studies – post marketing surveillance

• After the product is marketed, Phase 4 studies provide additional details about the product’s safety and efficacy.

• May be used to evaluate formulations, dosages, durations of the treatment, medicine interactions, and other factors.

• Patients from various demographic groups may be studied.

68

Phase 4 studies – post marketing surveillance

• Important part of many Phase 4 studies: detecting and defining previously unknown or inadequately quantified adverse reactions and related risk factors.

• Phase 4 studies are often observational studies rather than experimental.

69

Hierarchy of medical evidence

• From weakest to strongest evidence –• Case reports• Case series• Database studies• Observational studies• Controlled clinical trials• Randomized controlled trial

Byar, 1978

70

Clarke MJ Ovarian Oblation in breast cancer, 1896 to 1998: milestones along hierarchy of evidence from case report to Cochrane review BMJ 1998; 317

71

Controlled studies

• Studies in which a test article is compared with a treatment that has known effects.

• The control group may receive no treatment, standard treatment or placebo.

72

What is a randomized clinical trial?

• A prospective study in humans• Randomization• Comparable control group• Complete accounting of all cases• Carefully monitored for safety and efficacy• Adheres to regulatory requirements;

GCP,FDA, ICH guidelines

73

Blinded studies

• Blinded study: one in which subject or the investigator (or both) are unaware of what trial product a subject is receiving.– Single-blind study: subjects do not know what

treatment they are receiving (active or control) – Double-blind study: neither the subjects nor the

investigators know what treatment a subject is receiving

74

Analysis Populations

75

Intent-to-Treat Principle

• Primary analysis in most randomized clinical trials testing new therapies or devices.

• Requires that any comparison among treatment groups in a randomized clinical trials is based on the results for all subjects in the treatment group to which they were randomly assigned.

• Full analysis: includes compliers and non-compliers

76

Intent-to-Treat

ITT Population includes the following:

All Randomized patients: Preserve initial randomization

- Prevents biased comparison

- Basis for statistical tests and inference

77

Intent-to-Treat

Problems: Predictable or Unpredictable

• Ineligible Patients allowed in the trial• Non-compliance, ie. not following the assigned treatment• Patients refusing a trial procedure• Prohibited medication• Early withdrawal/termination• Invalid data

78

Intent-to-Treat

FDA guideline related to regulatory submission states

‘As a general rule, even if the sponsor’s preferred analysis is based on a reduced subset of the patients with data, there should be an additional “intent-to-treat” analysis using all randomized patients.’

Ref: ICH E3: Structure and Content of Clinical Study Reports

79

Intent-to-Treat

When can we exclude randomized patients?

• Failure to satisfy major entry criteria

• Failure to take at least one dose of medication

• Failure to complete procedure

• Lack of any data post-randomization

• Lost to follow up

• Missing data randomly, not related to treatment assignment

80

Intent-to-Treat

Problem: In a 6-Month study, what should be done with the patient who drops out and provides no further data after 2 months ?

81

Intent-to-Treat

Last Observation Carried Forward (LOCF)

Use last available valid observation post-baseline on a particular variable for the missing visit through the end of study

82

LOCF – last observation carried forward

Time

Baseline Week 1 Week 2 Week 4 Week 8 Week 12

Y D

ata

8

10

12

14

16

18

20

22

24

26

83

Last Observation Carried Forward (LOCF)

Biased if the early withdrawal is treatment related

84

Example

The primary analysis sample will be based on the principle of intention-to-treat. All patients who sign the written Informed Consent form, meet the study entry criteria, and undergo randomization will be included in the analysis, regardless of whether or not the assigned treatment device was implanted.

85

Intent-to-Treat Principle

• Using the complete analysis data set:– Preserves the randomization at the time of analysis

which helps prevent bias – Provides the foundation for statistical testing.– Provides estimates of treatment effects which are

more likely to mirror those observed in clinical practice.

86

Argument against ITT

• An ITT, by including subjects, randomized to the drug but who received little or no drug will dilute the treatment effect when compared to the placebo group

87

How can we improve the ITT analysis?

•Careful identification of inclusion/exclusion criteria

• Careful review of reasons for failure, missing data, and exclusions

• Adherence to Good Clinical Practices

• Better monitoring practices to reduce the protocol deviations and non compliance

• Appropriate and detailed statistical plan and analysis

88

Per-Protocol aka Evaluable patient population

Subset of ITT who are compliant with the protocol and excluding patients who:

• Major protocol violation/deviation

• Use prohibited medication as per protocol

• Technical or procedural failure

• Lost to follow up, lack of efficacy/response

• Wrong treatment assignment

89

Per-Protocol Population

Advantages and disadvantages:

• Analysis in its pure form, completely as per the protocol

• Maximize the efficacy from new treatment

•Not a conservative approach, results in bias due to exclusion

90

Per-Protocol Population

Advantages and disadvantages:

•May not have enough power and sample size

• Both analyses are done in confirmatory trials

•If the results and conclusions are the same from two analyses, the confidence is higher.

91

Blinding and Randomization

92

Randomisation

93

History

• The concept of randomisation was introduced by R.A. Fisher in 1926 in the area of agricultural research.

• Previous to that clinical trials in the 18th and 19th centuries had used controls from the literature, other historical controls and concurrent controls.

94

Randomisation

• To guard against any use of judgement or systematic arrangements i.e to avoid bias

• To provide a basis for the standard methods of statistical analysis such as significance tests

• Assures that treatment groups are balanced (on average) in all regards.• i.e. balance occurs for known prognostic

variables and for unknown or unrecorded variables

95

• Inferential statistics calculated from a clinical trial make an allowance for differences between patients and that this allowance will be correct on average if randomisation has been employed.

96

• Randomisation promotes confidence that we have acted in utmost good faith. It is not to be used as an excuse for ignoring the distribution of known prognostic factors.

• Randomisation is essential for the effective blinding of a clinical trial.

97

Non-Randomised Trials

• It is difficult to obtain a reliable assessment of treatment effect from non-randomised studies.

98

Uncontrolled Trials

• Medical Practice implies that a doctor prescribes a treatment for a patient that in his/her judgement, based on past experience, offers the best prognosis.

• Clinicians are always looking for new therapies, improvements in therapies and alternative therapies.

99

• When a new treatment is proposed some clinicians might try it on a few patients in an uncontrolled trial.

• The new treatment is studied without any direct comparison with a similar group of patients on more standard therapy.

100

• Uncontrolled trials have the potential to provide a very distorted view of therapy.

• Why?

101

Laetrile• In the 1970s in the US Laetrile achieved

widespread popular support for treating advanced cancer of all types without any formal testing in clinical trials.

• NCI tried to collect documented cases of tumour response after Laetrile therapy. Although an estimated 70,000 cancer patients had tried Laetrile only 93 cases were submitted for evaluation and 6 were judged to have a response.

102

Laetrile

• An uncontrolled trial of 178 patients found no benefit and evidence of cyanide toxicity

• The final conclusion of NCI was that “Laetrile is a toxic drug that is not effective as a cancer treatment”

103

• Uncontrolled trials are much more likely to lead to enthusiastic recommendation of the treatment as compared with properly controlled trials.

104

Historical Controls

• Instead of randomising groups studies compare the current patients on the new treatment with previous patients who had received the standard treatment.

• This is a Historical Control group.

105

• Major flaw: - How can we be sure that the comparison is fair. How do we know whether the 2 groups differ with respect to any feature other than the treatment itself.

106

Patient Selection

• Historical control group is less likely to have clearly defined criteria for patient inclusion because the patients on the standard treatment were not known to be in the clinical trial when their treatment began.

• Historical controls were recruited earlier and possibly from a different source and therefore might be a different type of patients.

• Investigator might be more restrictive in choice of patients for new treatment

107

Concurrent Non-randomised Controls

• Use some pre-determined systematic method or investigator judgement to assign patients to groups

108

Non-Randomised controls

• Date of Birth – odd/even day of birth = new/standard treatment

• Date of presentation – odd/even days = new/standard treatment

• Alternate assignment – odd/even patients= new/standard treatment

109

Example

• Trial of anticoagulant therapy for MI• Patients admitted on odd days of the

month received anticoagulant and patients admitted on even days did not.

Treated Control

N 589 442

110

• Is it ethical to randomise?– Assuming we have sufficient supply of the

new treatment why shouldn’t every new patient be given the new treatment?

111

• Tendency is to do non-randomised trial first and then follow up with RCT.

• However it is difficult to do the RCT if the results from the non-randomised trial are too good.

112

• We assume that the new treatment has a

reasonable chance of being an improvement.

• Before agreeing to enter patients into a randomised trial the investigator must be prepared to stay objective about the treatments involved.

• Randomised trials often produce scientific evidence that contradicts prior beliefs.

113

Equipoise

• What is “equipoise” and why is it important?– A state of being equally balanced;

• Clinical equipoise provides the ethical basis for medical research involving randomly assigning patients to different treatment arms.

114

Clinical Equipoise

Term was first used by B. Freedman in 1987, in the article 'Equipoise and the ethics of clinical research‘ NEJM 1987 317(3) .

“The ethics of clinical research requires equipoise - a state of genuine uncertainty on the part of the clinical investigator regarding the comparative therapeutic merits of each arm in a trial. Should the investigator discover that one treatment is of superior therapeutic merit, he or she is ethically obliged to offer that treatment. “

115

Clinical Equipoise

Freeman suggests that as long as there is genuine uncertainty within the expert medical communityabout the preferred treatment then there can be clinical equipoise, even if a specific investigator has a preference.

116

Randomisation

117

Randomisation

• Randomised trial with two treatments, A or B

• How do we assign treatments:• Toss a coin each time: Heads = A, Tails = B• Random Numbers Table• Random Permuted Blocks

118

Flip a coin

• Could flip coin for each participant—called complete randomisation or simple randomisation

• Problem: can get imbalance in groups, especially in smaller trials– Imbalance in prognostic factors more likely– Inefficient for estimating treatment effect

119

Probability of 5 Treated and 5 Controls in 10 patients

• What is the probability of getting 5 Treated patients out of 10?

• Remember the binomial distribution

120


• The binomial probability function

xnxX pp

xnx

nxP

1

!!

!

X ~ Binomial (n = 10, p=0.5)

In this case, we want x=5

121

Imbalance with 10 Participants

(#T, #C) Probability Efficiency

(5,5) .246 1

(4,6) or (6,4) .410 .96

(3,7) or (7,3) .234 .84

(2,8) or (8,2) .088 .64

(1,9) or (9,1) .020 .36

(0,10) or (10,0) .002 0

122

• Even if treatment balanced at end of trial, may be unbalanced at some time

• E.g., may be balanced at end with 400 participants, but first 10 might be

CCCCTCTCTC

123

Random Permuted Blocks

• To balance over time, could randomize in blocks (called random permuted blocks)

• Conceptually, for blocks of size 4: put 2 T labels & 2 C labels in hat: for next 4 participants, draw labels at random without replacement from hat

• TTCC TCTC TCCT CTTC CTCT CCTT all equally likely

124

Forces balance after every 4

TCTC CCTT C T C T

1 2 3 4 5 6 7 8 9 10 11 12

T TC C

T TC C

T TC C

125

Randomisation by blocks – 5 sites, 6 patients per site

Patients/Sites

1 2 3 4 5 6

1 A A B A B B

2 B A A A B B

3 B B B A A A

4 A B A B A B

5 A A B B B A

126

Incomplete Blocks

• What happens if a site does not enroll all the patients in a block?

• What happens if multiple sites do not enroll all the patients in a block?

127

• The smaller the block size, the more often balance is forced: e.g., in trial of 100,– blocks of size 2 force balance after every

2– A block of size 100 forces balance only

at end

128

• With blocks of size 2 in an unblinded trial,

we know every second participant’s assignment in advance

• I can veto potential participants until I find one I like (sick one if next assignment is control, healthy one if next patient is treatment)

• Schulz KF Subverting Randomization in Controlled Trials, JAMA 1995 Vol. 274

129

• Even with larger blocks, in unblinded trial you know some assignments in advance

• With blocks of size 8 if first 6 are TCTTCT, we know next 2 are C

• Using a variable block size in a study makes it harder to guess

• Never include the block size in a protocol

130

Subgroup balance

• Sometimes want to balance treatment assignments within subgroups

• Especially important if subgroup size is small

• E.g., with 6 diabetics in a trial, with a complete randomisation, there is 22% chance of 5-1 or 6-0 split!

131

Stratified Randomisation

• To avoid this problem could stratify the randomisation (use blocked randomisation separately for factors such as diabetics & nondiabetics)

• E.g., for blocks of size 6,

Diabetics Nondiabetics

CTTCCT TTCTCC TCCTTC…

132

Stratified Block randomisation

• Typical examples of such factors are age group, severity of condition, and treatment centre. Stratification simply means having separate block randomisation schemes for each combination of characteristics (‘stratum’)

133

Stratified Block randomisation

• For example, in a study where you

expect treatment effect to differ with age and sex you may have four strata:

male over 65,

male under 65,

female over 65

female under 65

134

Stratification

• If we believe that gender is a prognostic factor, that is, the treatment effect for males may be different than the treatment effect for females then we should stratify the randomisation (and the analysis) on gender

• This does not mean that we need identical numbers of males and females in the trial, but rather that the males be equally distributed between treatment and control and the females also be equally distributed between treatment and control

135

Stratification

• Example:• In RA trials there are usually about 70% females

and 30% males. • Stratification at randomisation would help ensure

that each treatment group had about 70% females and 30% males.

• If we believe that males and females may have different responses to treatment this would be important.

136

Blinding

137

Blinding

• Many potential problems can be avoided if everyone involved in the study is blinded to the actual treatment the patient is receiving.

• Blinding (also called masking or concealment of treatment) is intended to avoid bias caused by subjective judgment in reporting, evaluation, data processing, and analysis due to knowledge of treatment.

138

Hierarchy of Blinding

• open label: no blinding

• single blind: patient blinded to treatment

• double blind: patient and assessors blinded to treatment

• complete blind: everyone involved in the study blinded to treatment

139

Open Label Studies

These may be useful for

• pilot studies

• dose ranging studies

However knowledge of treatment can lead to:

• over or under reporting of toxicity

• over estimation of efficacy

Even a small fraction of patients assigned at random to placebo will reduce these potential problems substantially.

140

Single Blind Studies

• Usually justified when it is practically infeasible to blind the investigator

• Patients should be blinded if the endpoints are patient reported outcomes and for safety

• Where possible use blinded assessor to elicit adverse events or patient outcomes

141

Double Blind Studies

• When both the subjects and the investigators are kept from knowing who is assigned to which treatment, the experiment is called “double blind"

• Serve as a standard by which all studies are judged, since it minimizes both potential patient biases and potential assessor biases

142

Double Blinding:Techniques

• Coded treatment groups• Sham treatments• If impossible – try to use a blinded

assessor for assessing endpoints.

143

Double Blind Studies: issues Side effects:• Side effects (observable by patient or

assessor) are much harder to blind and are one of the major ways in which blinding is broken

Efficacy:• A truly effective treatment can be recognized

by its efficacy in patients

144

Hypothesis Testing

145

Hypothesis Testing

• Steps in hypothesis testing: state problem, define endpoint, formulating hypothesis, - choice of statistical test, decision rule, calculation, decision, and interpretation

• Statistical significance: types of errors, p-value, one-tail vs. two-tail tests, confidence intervals

• Significance vs. non-significance

• Equivalence vs. superiority tests

146

Descriptive and inferential statistics

• Descriptive statistics is devoted to the summarization and description of data (population or sample) .

• Inferential statistics uses sample data to make an inference about a population .

147

Objectives and Hypotheses

• Objectives are questions that the trial was designed to answer

• Hypotheses are more specific than objectives and are amenable to explicit statistical evaluation

148

Examples of Objectives

• To determine the efficacy and safety of Product ABC in diabetic patients

• To evaluate the efficacy of Product DEF in the prevention of disease XYZ

• To demonstrate that images acquired with product GHI are comparable to images acquired with product JKL for the diagnosis of cancer

149

How do you measure the objectives?

• Endpoints need to be defined in order to measure the objectives of a study.

150

Endpoints: Examples:

• Primary Effectiveness Endpoint –

– Percentage of patients requiring intervention due to pain, where an intervention is defined as :

1. Change in pain medication

2. Early device removal

151

Endpoints: Examples:

• Primary Endpoint:

Percentage of patients with a reduction in pain:

– Reduction in the Brief Pain Inventory (BPI) worst pain scores of ≥ 2 points at 4 weeks over baseline.

152

Endpoints: Examples

• Patient Survival– Proportion of patients surviving two years post-

treatment– Average length of survival of patients post-

treatment

153

Objectives and Hypotheses

• Primary outcome measure

– greatest importance in the study

– used for sample size

– More than one primary outcome measure - multiplicity issues

154

Hypothesis Testing

• Null Hypothesis (H0)– Status Quo– Usually Hypothesis of no difference– Hypothesis to be questioned/disproved

• Alternate Hypothesis (HA)– Ultimate goal– Usually Hypothesis of difference– Hypothesis of interest

155

Hypothesis Testing

If Ho is

True False

Decision Fail to reject

No Error Type II Error (β)

Reject Type I Error (α) No Error

Type I Error – Society’s Risk

Type II Error – Sponsor’s Risk

156

Hypothesis testing

• Null Hypothesis – No difference between Treatment and Control

• Type I error aka alpha, , p-value– The probability of declaring a difference

between treatment and control groups even though one does not exist (ie treatment is not statistically different from control in this experiment)

– As this is “society’s risk” it is conventionally set at 0.05 (5%)

157

Hypothesis testing

• Type II error aka beta, – The probability of not declaring a difference

between treatment and control groups even though one does exist (ie treatment is statistically different from control in this experiment)

– 1 - is the power of the study• Often set at 0.8 (80% power) however many

companies use 0.9• Underpowered studies have less probability of

showing a difference if one exists

158

Steps in Hypothesis Testing

1. Choose the null hypothesis (H0) that is to be tested

2. Choose an alternative hypothesis (HA) that is of interest

3. Select a test statistic, define the rejection region for decision making about when to reject H0

4. Draw a random sample by conducting a clinical trial

159

Steps in Hypothesis Testing

5. Calculate the test statistic and its corresponding p-value

6. Make conclusion according to the pre-determined rule specified in step 3

160

Hypothesis Testing – Normal Distribution

161

Test of Significance and p-value

• Statistically significant:– Conclusion that the results of a study are

not likely to be due to chance alone. – Clinical significance is unrelated to

statistical significance

162


p-value– Probability that the observed relationship (e.g.,

between variables) or a difference (e.g., between means) in a sample occurred by pure chance and that in the population from which the sample was drawn, no such relationship or differences exist.

– It is not the probability that given result is wrong.

163


p-value– The smaller the p-value, the more likely that the

observed relation between variables in the sample is a reliable indicator of the relation between the respective variables in the population.

164


The p-level of .05 (i.e.,1/20) indicates that there is a 5% probability that the relation between the variables found in our sample is “by chance alone“.

In other words, assuming that in the population there was no relation between those variables whatsoever, and we were repeating experiments like ours one after another, we could expect that approximately in every 20 replications of the experiment there would be one in which the relation between the variables in question would be equal or stronger than in ours.

165

Sample versus population

166

Estimation

• We use results from our sample to make inference about the population– How reliable are the sample data at

representing the population data? – Is the sample mean a good estimation of the

population mean?

167

Confidence Intervals

• The results of the analysis are estimates of the “truth” in the population.

• The “average reduction in pain score” is an estimate based on the sample in the study.

Confidence Intervals indicate the precision of the estimate. The wider the confidence interval, the less precise the estimate

168

Confidence Intervals

Example:• Average reduction in pain score from baseline to month

6 was 9.7 (95% Confidence Interval: 8.3 to 11.1)

• This does not mean that we are 95% sure that the “true” result lies between 8.3 and 11.1, rather if we were to repeat the study 100 times with the same sample size and characteristics, 95 of the studies would probably show a mean reduction in pain score between 8.3 and 11.1

169

What have we learnt?

• Statistics doesn’t have to be frightening.• Statistics is all about a way of thinking• If you don’t have uncertainty you don’t need

statistics• p-values are probability statements that tell you

something about your experiment

170

What haven’t we learnt?

• All the detailed theory and formulae that back up everything we have discussed

• How to be a statistician (for that you do have to go to graduate school)

• How to get the perfect answer each time we run a clinical trial:– We are working with patients not widgets and human

beings are incredibly complex

171

References

• ICH Guidelines E9, E3 and others• Statistical Issues in Drug Development – Stephen

Senn 1997 John Wiley & Sons• Freeman B. Equipoise and the ethics of clinical

research NEJM 1987 317(3) • Schulz KF. Subverting Randomization in

Controlled Trials, JAMA 1995 Vol. 274

172

Thank You !

[email protected]

Documents

Statistics for Non-Statisticians Kay M. Larholt, Sc.D. Vice President, Biometrics & Clinical Operations Abt Bio-Pharma Solutions