31
SJS SDI_9 1 Design of Statistical Investigations Stephen Senn 9 Unbalanced Designs

SJS SDI_91 Design of Statistical Investigations Stephen Senn 9 Unbalanced Designs

Embed Size (px)

Citation preview

Page 1: SJS SDI_91 Design of Statistical Investigations Stephen Senn 9 Unbalanced Designs

SJS SDI_9 1

Design of Statistical Investigations

Stephen Senn

9 Unbalanced Designs

Page 2: SJS SDI_91 Design of Statistical Investigations Stephen Senn 9 Unbalanced Designs

SJS SDI_9 2

Lack of Orthogonality

• So far we have been considering “balanced designs”– for example every treatment appears equally

frequently in every block

• Sometimes we do not have such balance– by accident

• missing observations

– by design

Page 3: SJS SDI_91 Design of Statistical Investigations Stephen Senn 9 Unbalanced Designs

SJS SDI_9 3

Consequences

• Some loss of efficiency– compared to some theoretical optimum

• CAUTION: this may not be obtainable in practice and may be why an unbalanced design has been chosen

• Complications in analysis– Sums of squares may depend on what other terms

have been fitted• so far only residual sum of squares has had this property

Page 4: SJS SDI_91 Design of Statistical Investigations Stephen Senn 9 Unbalanced Designs

SJS SDI_9 4

Exp_11Senn 2002 Example 5.1

• Cross-over trial in asthma

• Comparison of salbutamol, formoterol, placebo

• Trial run in six sequences

• Unequal numbers of patients per sequence

Page 5: SJS SDI_91 Design of Statistical Investigations Stephen Senn 9 Unbalanced Designs

SJS SDI_9 5

Exp_11Sequences and Periods:Number of Observations I II III

FSP 5 5 5SPF 3 3 3PFS 6 6 6FPS 6 6 6SFP 5 5 5PSF 5 5 5

Patients by Sequence FSP SPF PFS FPS SFP PSF

5 3 6 6 5 5

Note that although there are no missing data due to patients not having completed a sequence, the numbers of patients are unbalanced by sequence

Page 6: SJS SDI_91 Design of Statistical Investigations Stephen Senn 9 Unbalanced Designs

SJS SDI_9 6

F

S

P

1000 2000 3000

1FSP

2FSP

1000 2000 3000

3FSP

4FSP

1000 2000 3000

5FSP

6FSP

F

S

P

1SPF

2SPF

3SPF

4SPF

5SPF

6SPF

F

S

P

1PFS

2PFS

3PFS

4PFS

5PFS

6PFS

F

S

P

1FPS

2FPS

3FPS

4FPS

5FPS

6FPS

F

S

P

1SFP

2SFP

3SFP

4SFP

5SFP

6SFP

F

S

P

1PSF

1000 2000 3000

2PSF

3PSF

1000 2000 3000

4PSF

5PSF

1000 2000 3000

6PSF

forced expiratory volume in one second

tre

atm

en

t

Page 7: SJS SDI_91 Design of Statistical Investigations Stephen Senn 9 Unbalanced Designs

SJS SDI_9 7

Exp_11Data

1 FSP 3500 3200 290010 FSP 3400 2800 220017 FSP 2300 2200 170021 FSP 2300 1300 140023 FSP 3000 2400 1800 4 SPF 2200 1100 2600 8 SPF 2800 2000 280016 SPF 2400 1700 3400 6 PFS 2200 2500 2400 9 PFS 2200 3200 330013 PFS 800 1400 100020 PFS 950 1320 148026 PFS 1700 2600 240031 PFS 1400 2500 2200

2 FPS 3100 1800 240011 FPS 2800 1600 220014 FPS 3100 1600 140019 FPS 2300 1500 220025 FPS 3000 1700 260028 FPS 3100 2100 2800 3 SFP 2100 3200 100012 SFP 1600 2300 160018 SFP 1600 1400 80024 SFP 3100 3200 100027 SFP 2800 3100 2000 5 PSF 900 1900 2900 7 PSF 1500 2600 200015 PSF 1200 2200 270022 PSF 2400 2600 380030 PSF 1900 2700 2800

Page 8: SJS SDI_91 Design of Statistical Investigations Stephen Senn 9 Unbalanced Designs

SJS SDI_9 8

Exp_11Not Fitting Period

> fit1 <- lm(fev1 ~ patient + treat)> summary(fit1, corr = F)

Coefficients: Value Std. Error t value Pr(>|t|)…... treatS -424.6667 87.3127 -4.8637 0.0000treatP -1099.0000 87.3127 -12.5869 0.0000

Residual standard error: 338.2 on 58 degrees of freedom

Multiple R-Squared: 0.8569

Page 9: SJS SDI_91 Design of Statistical Investigations Stephen Senn 9 Unbalanced Designs

SJS SDI_9 9

Exp_11 Fitting Period

> fit2 <- update(fit1, . ~ . + period)summary(fit2, corr = F)Call: lm(formula = fev1 ~ patient + treat + period)

Coefficients: Value Std. Error t value Pr(>|t|)…... treatS -422.6220 88.2647 -4.7881 0.0000 treatP -1103.4638 87.8208 -12.5649 0.0000 periodII -109.7228 87.8208 -1.2494 0.2167periodIII -42.7659 88.2647 -0.4845 0.6299

Residual standard error: 339.4 on 56 degrees of freedomMultiple R-Squared: 0.8608

Page 10: SJS SDI_91 Design of Statistical Investigations Stephen Senn 9 Unbalanced Designs

SJS SDI_9 10

Exp_11ANOVA

> aov.1 <- aov(fev1 ~ patient + treat)> summary(aov.1) Df Sum of Sq Mean Sq F Value Pr(F) patient 29 21279472 733775 6.41677 1.065573e-009 treat 2 18428682 9214341 80.57832 0.000000e+000Residuals 58 6632451 114353

> aov.2 <- aov(fev1 ~ patient + period) > summary(aov.2) Df Sum of Sq Mean Sq F Value Pr(F) patient 29 21279472 733774.9 1.703663 0.0424500 period 2 80282 40141.1 0.093199 0.9111486Residuals 58 24980851 430704.3 >

Page 11: SJS SDI_91 Design of Statistical Investigations Stephen Senn 9 Unbalanced Designs

SJS SDI_9 11

Exp_11ANOVA

> aov.3 <- aov(fev1 ~ patient + period + treat)>summary(aov.3) Df Sum of Sq Mean Sq F Value Pr(F) patient 29 21279472 733775 6.37115 0.0000000 period 2 80282 40141 0.34853 0.7072422 treat 2 18531248 9265624 80.45067 0.0000000Residuals 56 6449603 115171> aov.4 <- aov(fev1 ~ patient + treat + period) > summary(aov.4) Df Sum of Sq Mean Sq F Value Pr(F) patient 29 21279472 733775 6.37115 0.0000000 treat 2 18428682 9214341 80.00540 0.0000000 period 2 182848 91424 0.79381 0.4571415Residuals 56 6449603 115171

Page 12: SJS SDI_91 Design of Statistical Investigations Stephen Senn 9 Unbalanced Designs

SJS SDI_9 12

Exp_11ANOVA

> ssType3(aov.3)Type III Sum of Squares Df Sum of Sq Mean Sq F Value Pr(F) patient 29 21279472 733775 6.37115 0.0000000 period 2 182848 91424 0.79381 0.4571415 treat 2 18531248 9265624 80.45067 0.0000000Residuals 56 6449603 115171 > ssType3(aov.4)Type III Sum of Squares Df Sum of Sq Mean Sq F Value Pr(F) patient 29 21279472 733775 6.37115 0.0000000 treat 2 18531248 9265624 80.45067 0.0000000 period 2 182848 91424 0.79381 0.4571415Residuals 56 6449603 115171

Page 13: SJS SDI_91 Design of Statistical Investigations Stephen Senn 9 Unbalanced Designs

SJS SDI_9 13

Exp_11Standard Errors

Period effect not fitted

1 1114353 87.3127

30 30SE

Period effect fitted

1 187.8208, SE=88.2647 115171

30 30SE

Page 14: SJS SDI_91 Design of Statistical Investigations Stephen Senn 9 Unbalanced Designs

SJS SDI_9 14

Incomplete Blocks

• These designs arise when the number of treatments exceeds the number of units in a typical block

• Not possible to have every treatment in every block

• Each block receives a subset of the units

• These to be chosen in a sensible manner

Page 15: SJS SDI_91 Design of Statistical Investigations Stephen Senn 9 Unbalanced Designs

SJS SDI_9 15

Exp_12Senn 2002 Example 7.2

• Placebo (P) controlled cross-over design to compare two doses of formoterol – F12 : 12 mg in a single puff– F24: 24 mg in a single puff

• Patients could only be treated in two periods

• Incomplete blocks design

• 24 Patients to be allocated in equal numbers to each of six sequences

Page 16: SJS SDI_91 Design of Statistical Investigations Stephen Senn 9 Unbalanced Designs

SJS SDI_9 16

EXP_12Sequences used

P F12

F12 P

P F24

F24 P

F12 F24

F24 F12

The basic design is said to be that of balanced incomplete blocks.

In this context balance has a special meaning: each pair of possible treatments appears equally often in every block

Because this is a cross-over design and we are worried about period effects the design is also balanced by period (order) but that is another matter

Page 17: SJS SDI_91 Design of Statistical Investigations Stephen Senn 9 Unbalanced Designs

SJS SDI_9 17

EXP_12The sad reality

• Two incorrect packs were picked up.– One was for correct sequence– One was not

Numbers of Observations PeriodSequence 1 2 F12F24 3 3 F12P 5 5 F24F12 4 4 F24P 4 4 PF12 4 4 PF24 4 4

F12 F24 has one fewer patient

F12 P has one more

Page 18: SJS SDI_91 Design of Statistical Investigations Stephen Senn 9 Unbalanced Designs

SJS SDI_9 18

EXP_12The Data

6 F12F24 2.500 2.45010 F12F24 1.750 1.72515 F12F24 1.370 1.120 4 F12P 3.400 2.50011 F12P 2.250 1.92514 F12P 1.460 1.26021 F12P 1.480 0.88023 F12P 2.050 2.100 2 F24F12 2.700 2.25012 F24F12 0.900 0.92513 F24F12 1.270 1.01024 F24F12 2.150 2.100

3 F24P 1.750 1.350 7 F24P 2.525 2.15018 F24P 1.080 0.84022 F24P 3.120 2.310 5 PF12 2.500 3.500 9 PF12 1.600 2.65016 PF12 1.750 2.19019 PF12 0.640 0.840 1 PF24 2.100 3.100 8 PF24 2.300 2.70017 PF24 1.030 1.87020 PF24 0.810 0.940

Page 19: SJS SDI_91 Design of Statistical Investigations Stephen Senn 9 Unbalanced Designs

SJS SDI_9 19

F12

F24

P

1.0 1.5 2.0 2.5 3.0 3.5

1F12F24

2F12F24

1.0 1.5 2.0 2.5 3.0 3.5

3F12F24

4F12F24

1.0 1.5 2.0 2.5 3.0 3.5

5F12F24

F12

F24

P

1F12P

2F12P

3F12P

4F12P

5F12P

F12

F24

P

1F24F12

2F24F12

3F24F12

4F24F12

5F24F12

F12

F24

P

1F24P

2F24P

3F24P

4F24P

5F24P

F12

F24

P

1PF12

2PF12

3PF12

4PF12

5PF12

F12

F24

P

1PF24

1.0 1.5 2.0 2.5 3.0 3.5

2PF24

3PF24

1.0 1.5 2.0 2.5 3.0 3.5

4PF24

5PF24

FEV1

Page 20: SJS SDI_91 Design of Statistical Investigations Stephen Senn 9 Unbalanced Designs

SJS SDI_9 20

Exp_12Analysis 1

> fit1 <- lm(FEV1 ~ patient + period + treat)> summary(fit1, corr = F)

Call: lm(formula = FEV1 ~ patient + period + treat)...Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) 2.8164 0.1854 15.1874 0.0000 patient2 -0.3770 0.2350 -1.6042 0.1236 ... patient24 -0.7270 0.2350 -3.0933 0.0055 period 0.0310 0.0667 0.4652 0.6466 treatF24 0.0402 0.0973 0.4134 0.6835 treatP -0.5041 0.0914 -5.5148 0.0000

Page 21: SJS SDI_91 Design of Statistical Investigations Stephen Senn 9 Unbalanced Designs

SJS SDI_9 21

Exp_12Analysis 2

> aov1 <- aov(FEV1 ~ patient + period + treat)> summary(aov1) Df Sum of Sq Mean Sq F Value Pr(F) patient 23 22.46280 0.976643 18.37451 0.0000000 period 1 0.00083 0.000833 0.01568 0.9015459 treat 2 2.32792 1.163962 21.89871 0.0000073Residuals 21 1.11619 0.053152 > ssType3(aov1)Type III Sum of Squares Df Sum of Sq Mean Sq F Value Pr(F) patient 23 23.64324 1.027967 19.34011 0.0000000 period 1 0.01150 0.011501 0.21638 0.6466003 treat 2 2.32792 1.163962 21.89871 0.0000073Residuals 21 1.11619 0.053152

Page 22: SJS SDI_91 Design of Statistical Investigations Stephen Senn 9 Unbalanced Designs

SJS SDI_9 22

Exp_12Analysis 3

> aov2 <- aov(FEV1 ~ patient + treat + period)> summary(aov2) Df Sum of Sq Mean Sq F Value Pr(F) patient 23 22.46280 0.976643 18.37451 0.0000000 treat 2 2.31726 1.158628 21.79836 0.0000075 period 1 0.01150 0.011501 0.21638 0.6466003Residuals 21 1.11619 0.053152 > ssType3(aov2)Type III Sum of Squares Df Sum of Sq Mean Sq F Value Pr(F) patient 23 23.64324 1.027967 19.34011 0.0000000 treat 2 2.32792 1.163962 21.89871 0.0000073 period 1 0.01150 0.011501 0.21638 0.6466003Residuals 21 1.11619 0.053152

Page 23: SJS SDI_91 Design of Statistical Investigations Stephen Senn 9 Unbalanced Designs

SJS SDI_9 23

Standard Errors• Consider the standard error of the contrast

F24 versus F12

• This is given as 0.0973

• How could this be calculated?

• There are two sequences in which these drugs could be compared– F12F24 with 3 patients– F24F12 with 4 patients

Page 24: SJS SDI_91 Design of Statistical Investigations Stephen Senn 9 Unbalanced Designs

SJS SDI_9 24

However

1 10.053152 =0.1761 0.0973

3 4

Thus the standard error we have from fitting the regression model is actually lower than that produced by a naïve argument.

Page 25: SJS SDI_91 Design of Statistical Investigations Stephen Senn 9 Unbalanced Designs

SJS SDI_9 25

QuestionsExp_12

• Why is the SE produced by the regression analysis lower than that produced by using the pooled MSE and the direct comparison of the means?

• What would the treatment estimate be if this naïve approach was used?

• How does it compare to that produced?

• What further information is the regression approach taking into account?

Page 26: SJS SDI_91 Design of Statistical Investigations Stephen Senn 9 Unbalanced Designs

SJS SDI_9 26

Block Size and ComparisonsSuppose that the block size is k (there are k units per block) and that there are b blocks in total and bk units in total

Suppose that we have v treatments and r replicates. There must also be rv units in total

Hence rv = bk = N .

Each block permits k(k-1)/2 comparisons. There are bk(k-1)/2 in total.

However, there are v(v-1)/2 possible pair-wise comparisons.

Page 27: SJS SDI_91 Design of Statistical Investigations Stephen Senn 9 Unbalanced Designs

SJS SDI_9 27

Block Size and Comparisons

Let be the average number of repetitions of the pair-wise comparisons in the design. Hence

( -1) 2 ( 1) ( 1)

( -1) 2 ( 1) ( 1)

bk k rv k r k

v v v v v

Obviously unless this is an integer, it will not be possible to “balance” the blocks.

If v-1 is a multiple of k-1 then it becomes particularly easy to balance the blocks

Page 28: SJS SDI_91 Design of Statistical Investigations Stephen Senn 9 Unbalanced Designs

SJS SDI_9 28

Exp_13

• It was desired to compare three doses each of two formulations of formoterol to placeo– ISF 6, ISF12, ISF24– MTA6, MTA12,MTA24– Placebo

• There are thus seven treatments• Maximum number of acceptable periods was

deemed to be five

Page 29: SJS SDI_91 Design of Statistical Investigations Stephen Senn 9 Unbalanced Designs

SJS SDI_9 29

Exp_13Possible solution

• Since 7-1 = 6 is twice 4-1 = 3 use design in 4 periods• If seven sequences are used it will also be possible to

make the treatments “uniform” on the periods• There are (7 6)/2 = 21 possible pair-wise comparisons

of treatments• Each patient provides (4 3)/2 = 6 possible comparison• There are 7 6 = 42 = 2 21 such comparisons per set

of seven sequences

Page 30: SJS SDI_91 Design of Statistical Investigations Stephen Senn 9 Unbalanced Designs

SJS SDI_9 30

I A B C D

II E A F B

III G C A E

IV D F G A

V F G B C

VI B D E G

VII C E D F

A Balanced Design Uniform on the Periods for 7 treatments in 4 periods

Page 31: SJS SDI_91 Design of Statistical Investigations Stephen Senn 9 Unbalanced Designs

SJS SDI_9 31

QuestionsExp_13

Exp_13 was in fact run using five periods and 21 sequences

• Check that such a design can be “balanced”

An alternative considered was to use five periods and seven sequences

• Show that such a design cannot be balanced• Why might it be preferable to the design in four

periods and seven sequences?