Prague 02.10.2008

Overview of the statistical analysis

Jonas Ranstam, PhD,National Musculoskeletal Competence Centre, Lund, Sweden

Explanations and points of reference

1. Methodological background2. International guidelines3. Multiplicity issues4. Study population definitions5. Statistical models

1. Methodological background

Clinical research

Before 1948

Unclear validity, unknown statistical precision

- Prof A's patients better than Prof B's- Small series of patients or even single cases

Streptomycin in Tuberculosis Trials Committee. Streptomycin treatment of pulmonary tuberculosis. BMJ 1948;2:769-83.

The Control Scheme

Determination of whether a patient would be treated by streptomycin and bed-rest (S case) or by bed-rest alone (C case) was made by reference to a statistical series based on random sampling numbers drawn up for each sex at each centre by Professor Bradford Hill; the details of the series were unknown to any of the investigators or to the co-ordinator and were contained in a set of sealed envelopes, each bearing on the outside only the name of the hospital and a number.

Clinical research

From 1948

Elimination/reduction of bias, assessment of statistical precision

- Randomization and blinding (intervention studies)- Effect modeling (observation studies)- P-values and confidence intervals

Quantitative principles I

Randomized allocation of patients to treatment groups (and blinding when possible) guarantee that:

1. All differences between treatment groups at baseline are random (not systematic).

Complete absence of baseline imbalance is not the aim. Stratification on prognostic factors are used to make the groups less imbalanced.

2. Treatment effect estimates are unaffected by selection and confounding bias (and with blinding, differential misclassification bias).

Quantitative principles II

1. Individual effects vary between subjects.Different samples of subjects will yield

different observed mean effects.

2. The subject variation can be estimatedusing the observations in a random sample.

3. A universal mean effect can be estimated, and the reliability of this estimate can be described with p-values and confidence

intervals.

P-values are often misunderstood

They do

- describe the reliability of findings. P < 0.05 is usually considered reliable.

They do not

- describe clinical relevance (they depend on sample size).

- show that a difference “does not exist” (“n.s.” is absence of evidence, not evidence of absence).

2. International guidelines

ICMJE – the Vancouver group

Results

“Avoid relying solely on statistical hypothesis testing, such as the use of P values, which fails to convey important information about effect size.”

“When possible, quantify findings and present them with appropriate indicators of measurement error or uncertainty (such as confidence intervals).”

Example: FREE SF36-PCS

Estimated treatment effect difference at baseline

Difference (95%Ci) p-value 0.4 (-1.7 – 2.6) 0.7

Estimated treatment effect difference at 1 month

Difference (95%Ci) p-value 5.9 (3.7 – 8.2) <0.0001

0Effect Clinically significant effect

Statistically and clinically significant effect

Statistically, but not necessarily clinically, significant effect

Inconclusive

Neither statistically nor clinically significant effect

Statistically significant reversed effect

p < 0.05

p < 0.05

n.s.

n.s.

p < 0.05

P-value Confidence intervals 2 possible outcomes 5 possible outcomes

P-values vs. confidence intervals

Bad Good

Clinical trialsInternational regulatory guidelines

ICH Topic E9 - Statistical Principles for Clinical Trials

EMEA Points to consider: baseline covariates- missing data- multiplicity issues- etc.

and similar documents from the FDA

These guidelines can all be found on the internet.

3. Multiplicity issues

Multiplicity

Multiplicity of inferences is present in almost all trials. If not properly handled, unsubstantiated claims for effectiveness may be made as a consequence of an inflated rate of false positive conclusions.

Multiplicity

The chance of at least one false positive finding (FPR) = 1 - (1 – α)k

where k is the number of performed comparisons and α the significance level (usually 0.05).

k = 1 => FPR = 0.05k = 2 => FPR = 0.0975k = 10 => FPR = 0.4013

Bonferroni method: divide the significance level by the number of comparisons. This is bad for the statistical power, should be avoided.

Endpoints

Primary The variable capable of providing themost clinically relevant evidencedirectly related to the primary objectiveof the trial

Secondary Either measurements supporting theprimary endpoint or effects related to

secondary objectives

Statistical analyses

Confirmatory The result concerns a primary endpoint and the p-value or confidence interval

accounts for potential multiplicity.

The result can support a claim of superiority, equivalence or non-

inferiority.

Exploratory All other analyses.

The result is either supporting or explanatory, or simply just a new hypothesis.

4. Study population definitions

Study populations

Intention-to-treat Analyze all randomized subjects(ITT) principle according to planned treatment

regimen.

Full analysis set The set of subjects that is as close(FAS) as possible to the ideal implied by

the ITT-principle.

Per protocol The set of subjects who complied(PP) set with the protocol sufficiently to ensure

that they are likely to exhibit the effects of treatment according to the

underlying scientific model.

FAS vs. PP-set

FAS + no selection bias- misclassification problem (effect dilution)

PP-set + no contamination problem- possible selection bias (confounding)

When the FAS and PP-set lead to essentially the sameconclusions, confidence in the trial is supported.

5. Statistical models

Fixed and random effects

Fixed effects when the levels of an effect constitute the entire population

about which you are interested.

Random effects when the levels in your experiment represent only a sample from that population.

Random effects models can be used to analyze data with multiple observations per patient.

Mixed effects model

If all the effects in a statistical model (ANOVA) are considered random effects, then the model is called a random effects model; likewise, a model with only fixed effects is called a fixed effects model. When some factors are fixed and others are random, the model is called a mixed model.

(R.A. Fisher 1926: Type-1 and type-2 ANOVA)

TimeBaseline

Effect

1st visit 2nd visit

Data from 3 subjects: Messrs. Green, Blue and Red

TimeBaseline

Effect

1st visit 2nd visit

Analysis requirement: FAS

TimeBaseline

Effect

1st visit 2nd visit

1. Assume independence between subjects'repeated observations and use ANOVA

TimeBaseline

Effect

1st visit 2nd visit

1. Assume independence between subjects'repeated observations and use ANOVA

Bad idea:Within-subject variation is confused with between-subject variation. Statistical precision will be incorrectly calculated.

TimeBaseline

Effect

1st visit 2nd visit

2. Repeated fixed effects comparisons e.g. Student's t-tests

TimeBaseline

Effect

1st visit 2nd visit

2. Repeated fixed effects comparisons e.g. Student's t-tests (no FAS)

TimeBaseline

Effect

1st visit 2nd visit

3. Fixed effects RM-model

TimeBaseline

Effect

1st visit 2nd visit

3. Fixed effects RM-model(no FAS)

TimeBaseline

Effect

1st visit 2nd visit

4. Fixed effects RM-model with LOCF

TimeBaseline

Effect

1st visit 2nd visit

4. Fixed effects RM-model with LOCF

LOCF-imputation is not necessarily conservative, and under-estimates variability.

Not the best alternative!

TimeBaseline

Effect

1st visit 2nd visit

5. Mixed effects (subject random) ANOVA

Within- and between subject variation are separated in the model. Statistical precision is correctly calculated.

A number of publica-tions reporting monte-carlo simulation studies show that this is the best alternative, both in terms of precision and validity!

TimeBaseline

Effect

1st visit 2nd visit

5. Mixed effects (subject random) ANOVA

Example: FREE SF36-PCS

Estimated treatment effect difference at 1 month

Method Difference p-value

ITT-analysisME ANOVA 5.5 <0.0001

PP-analysisFE ANOVA Compl. 5.2 <0.0001FE ANOVA LOCF 4.9 <0.0001

Thank you for your attention!

Business

Prague 02.10.2008