Upload
mraymen
View
227
Download
0
Embed Size (px)
Citation preview
1
Statistics
CSE 807
2
Experimental Design and Analysis
How to:• Design a proper set of experiments for measurement or
simulation.• Develop a model that best describes the data obtained.• Estimate the contribution of each alternative to the
performance.• Isolate the measurement errors.• Estimate confidence intervals for model parameters.• Check if the alternatives are significantly different.• Check if the model is adequate.
3
Example
• Personal workstation design.• Processor:68000, Z80, or 8086.• Memory size: 512K, 2M, or 8M bytes.• Number of Disks: One, two, three, or four.• Workload: Secretarial, managerial, or scientific.• User education: High school, college, or Post-
graduate level.
4
Terminology• Response Variable: Outcome.
E.g., throughput, response time.• Factors: Variables that affect the response variable.
E.g., CPU type, memory size, number of disk drivers, workload used, and user’s educational level.
Also called predictor variables or predictors.• Levels: The value that a factor can assume.
E.g., the CPU type has three levels: 68000, 8080, or Z80.# of disk drives has four levels. Also called treatment.
5
Terminology (cont’d)• Primary Factors: The factors whose effects need to
be quantified.E.g., CPU type, memory size only, and number of disk
drives.• Secondary Factors: “Factors whose impact need
not be quantified.E.g., the work loads.
• Replication: Repetition of all or some experiments.
6
Terminology (cont’d)
• Design: The number of experiments, the factor level and number of replications for each experiment.E.g., Full Factorial design with 5 replications: 3 X 3 X 4 X 3 X 3 or 324 experiments, each repeated five
times.• Experimental Unit: Any entity that is used for
experiments.E.g., users. Generally, no interest in comparing the units.Goal - minimize the impact of variation among the units.
7
Terminology (cont’d)
• Interaction => Effect of one factor depends upon the level of the other.
Non-interacting Factors Interacting Factors
A1 A2B1B2
36
58
A1 A2B1B2
36
59
8
Common Mistakes in Experimentation
1. The variation due to experimental error is ignored.2. Important parameters are not controlled.3. Effects of different factors are not isolated.4. Simple one-factor-at-a-time designs are used5. Interactions are ignored.6. Too many experiments are conducted.
Better: two phases.
9
Types of Experimental Designs
• Simple Designs: Vary one factor at a time
– #of Experiments =
Not statistically efficient.Wrong conclusions if the factors have interaction.Not recommended.
k
iin
1
)1(1
10
Types of Experimental Designs (cont’d)
• Full Factorial Design: All combinations.
– # of Experiments =
Can find the effect of all factors.Too much time and money.May try 2k design first
k
iin
1
11
Types of Experimental Designs (cont’d)
• Fractional Factorial Designs: Save time and expense.Less information.May not get all interactions.Not a problem if negligible interactions.
12
A Sample Fractional Factorial Design.
ExperimentNumber CPU Memory
LevelWorkload
TypeEducational
Level123456789
680006800068000Z80Z80Z80808680868086
512K2M8M
512K2M8M
512K2M8M
ManagerialScientificSecretarialScientificSecretarialManagerialSecretarialManagerialScientific
High SchoolPost-graduate
CollegeCollege
High SchoolPost-graduatePost-graduate
CollegeHigh School
13
Exercise• The performance of a System being designed depends
upon the following three factors:a. CPU type: 68000, 8086, 80286b. Operating System type: CPM, MS-DOS, UNIXc. Disk drive type: A, B, CHow many experiments are required to analyze the performance ifa. There is significant interaction among factors.b. There is no interaction among factorsc. The interactions are small compared to main effects.
14
2k Factorial Designs
• k factors, each at two levels.• Easy to analyze.• Helps in sorting out impact of factors.• Good at the beginning of study.• Valid only if the effect is unidirectional.
E.g., memory size, the number of disk drives
15
22 Factorial Designs• Two factors, each at two levels
Performance in MIPSCacheSize
Memory size4M Bytes 16M Bytes
4575
1525
1K2K
-1 if 4M bytes memory
1 if 16M bytes memory
-1 if 1M bytes cache
1 if 2M bytes cache
{{
xA=
xB=
16
Modely = q0 + qAxA + qBxB +qABxAxB
15= q0 - qA - qB + qAB
45= q0 + qA - qB - qAB
25= q0 - qA + qB - qAB
75= q0 + qA + qB + qAB
y = 40 + 20xA + 10xB + 5xAxB
Interpretation: Mean performance = 40 MIPSEffect of memory = 20 MIPS
Effect cache = 10 MIPSInteraction between memory and cache = 5 MIPS
17
Computation of EffectsExperiment A B y
1234
-11-11
-1-111
y1y2y3y4
Model: y = q0 + qAxA + qBxB +qABxAxB
Substitution:y1 = q0 - qA - qB + qAB
y2 = q0 + qA - qB - qAB
y3 = q0 - qA + qB - qAB
y4 = q0 + qA + qB + qAB
18
Computation of Effects (cont’d)Solution:q0 =1/4 (y1 + y2 + y3 + y4)
qA =1/4 (-y1 + y2 - y3 + y4)
qB =1/4 (-y1 - y2 + y3 + y4)
qAB =1/4 (y1 - y2 - y3 + y4)
Notice that effects are linear combinations of responses.Sum of the coefficients is zero => contrasts.Notice: qA = Column A x Column y
qB = Column B x Column y
qAB = Column A x Column B x Column y
19
Sign Table Method
I A B AB y1111
-11-11
-1-111
1-1-11
15452575
16040
8020
4010
205
TotalTotal/4
20
Allocation of Variation• Importance of a factor = proportion of the
variation explained
• Sample variance of• Variation of y Numerator
= sum of squares total (SST)
12
)(
2
2
1
2
2
2
i
i
y
yysy
22
1
2)(i
i yy
21
Allocation of Variation (cont’d)For a 22 design:
Variation due to Variation due toVariation due to interaction
SST = SSA + SSB + SSABFraction explained byVariation Variance
222222 222 ABBA qqqSST
22
22
2
2
B
A
qSSBB
qSSAA
222 ABqSSAB
SSTSSAA
22
DerivationModel:yi = q0 + qAxAi + qBxBi +qABxAixBi
Notice1. The sum of entries in each column is zero:
;0;0;04
1
4
1
4
1
i
BiAii
Bii
Ai xxxx
4)(
4
4
4
1
2
4
1
2
4
1
2
iBiAi
iBi
iAi
xx
x
x2. The sum of the squares of entries in each column is 4:
23
Derivation (cont’d)• 3. The columns are orthogonal (inner
product of any two columns is zero):
0)(
0)(
0
4
1
4
1
4
1
iBiAiBi
iBiAiAi
iBiAi
xxx
xxx
xx
24
Derivation (cont’d)
Sample mean
0
4
141
4
141
4
141
4
104
1
4
104
1
4
141
)(
q
xxqxqxqq
xxqxqxqq
y
y
iBiAiAB
iBiB
iAiA
i
BiAiABBiBAiAi
ii
25
Derivation (cont’d)Variation of y
222
4
1
224
1
224
1
22
4
1
24
1
24
1
2
4
1
2
4
1
2
444
0)()()(
)()()(
)(
)(
ABBA
iBiAiAB
iBiB
iAiA
iBiAiAB
iBiB
iAiA
iBiAiABBiBAiA
ii
qqq
xxqxqxq
xxqxqxq
xxqxqxq
yy
Product terms
26
ExampleMemory-cache study:
40)75254515(41 y
Total Variation
222
2222
4
1
2
54104204
2100)3515525(
)(
i
i yy
Total variation = 2100Variation due to memory = 1600 (76%)Variation due to cache = 400 (19%)Variation due to interaction = 100 (5%)
27
Case Study: Interconnection NetMemory interconnection networks:
Omega and Crossbar.Memory reference patterns:
random and MatrixFixed factors:1. Number of processors was fixed at 16.2. Queued requests were not buffered but blocked.3. Circuit switching instead of packet switching.4. Random arbitration instead of round robin.5. Infinite interleaving of memory => no memory back
contention.
28
22 Design for Interconnection NetworksFactors Used in the Interconnection Network Study
LevelSymbol Factor -1 1AB
Type of the networkAddress Pattern Used
CrossbarRandom
Omegamatrix
ResponseA B Throughput T 90%Transit N Response R-11-11
-1-111
0.06410.42200.79220.4717
3524
1.6552.3781.2622.190
29
Interconnection Network Study (cont’d)
Para-meter
Mean Estimate Variation Explained
q0
qA
qB
qAB
0.57250.0595-0.1257-0.0346
3.5-0.51.00.0
1.871-0.1450.4130.051
17.2%77.0%5.8%
20%80%0%
10.9%87.8%1.3%
T N R T N R
30
Interpretation of Results• Average throughput = 0.5725• Most effective factor = B = reference pattern
=> The address patterns chosen are very different.• Reference pattern explains 0.1257 (77%) of
variation• Effect of network type = 0.0595
Omega networks = Average + 0.0595Crossbar networks = Average - 0.0595Difference between the two = 0.119
• Slight interaction (0.0346) between reference pattern and network type.
31
General 2k Factorial Designsk factors at two levels each.2k experiments.2k effects:
k main effects
3
2
k
kTwo factor interactions
Three factor interactions...
32
2k Design ExampleThree factors in designing a machine:Cache sizeMemory sizeNumber of processors
Factor Level -1 Level 1ABC
Memory SizeCache SizeNumber of Processors
4MB1kB1
16MB2kB2
33
2k Design Example (cont’d)CacheSize
4M Bytes 16M Bytes
1K Byte2K Byte
1 Proc1410
2 Proc 1 Proc 2 Proc4650
2234
5886
I A B C AB AC BC ABC y11111111
-11
-11
-11
-11
-1-111
-1-111
-1-1-1-11111
1-1-111
-1-11
1-11
-1-11
-11
11
-1-1-1-111
-111
-11
-1-11
1422103446585086
32040
8010
405
16020
405
162
243
91
TotalTotal/8
34
Analysis
4512 8 72 32 200 3200 200 800) 1 3 2 5 20 5 10 ( 8
) ( 22 2 2 2 2 2 2
2 2 2 2 2 2 2 3
ABC BC AC AB C B Aq q q q q q q
SST
=18%+4%+71%+4%+1%+2%+0%=100%
Number of Processors (C) is the most important factor
35
ExerciseAnalyze the 23 design:
A1 A2
B1
B2
C1
10040
C2 C1 C2
1530
12020
1050
a. Quantify main effects and all interactions.b. Quantify percentages of variation explained.c. Sort the variables in the order of decreasing importance