View
213
Download
0
Embed Size (px)
Citation preview
October 21, 2011 C. Purdy--Graduate Seminar--Design of Experiments
1
SECS SeminarDesign of Experiments
•How to frame a hypothesis/thesis • Theoretical, simulation, and design hypotheses• Hypotheses/theses• How to determine important factors for experiments and translate them into experiments with dependent and independent variables • How to design sets of experiments to collect sufficient data to test a hypothesis
Reporting Results of Experiments• How to use statistical tools correctly • How to display results correctly
Prof. Carla Purdy (partially based on material provided by Prof. Hal Carter)
October 21, 2011 C. Purdy--Graduate Seminar--Design of Experiments
2
IMPORTANT THINGS TO REMEMBER
• This lecture will be just a brief overview of the experimental method and design of experiments.
• Proper experimental technique relies heavily on the field of STATISTICS. Anyone doing experimental work should have a good working knowledge of statistics.
October 21, 2011 C. Purdy--Graduate Seminar--Design of Experiments
3
World Models
Theory(Classical) Probability
Assume a coin with
P(H) = p, P(T) = 1-p
Assume each coin toss is independent of the others.
After N tosses, the expected number of heads is Np, the standard deviation is Np(1-p), ...
Experiment: Real-world Errors
Statistics
Given a coin, toss it N times.
The number of heads is K, where 0 < K < N
The number K is the sample mean.
If we repeat the experiment M times, we will have M sample means.
p = ?
October 21, 2011 C. Purdy--Graduate Seminar--Design of Experiments
4
TERMINOLOGYtreatment: procedure, process or algorithm we are studying
problem instance: data point to which we apply the procedure or algorithm
treatments
probleminstances
a missed region
an experiment
October 21, 2011 C. Purdy--Graduate Seminar--Design of Experiments
5
Introduction--The Three Faces of the Experimenter
Problem Instance / Treatment Space
I tried my treatmenton one carefully chosen problem instance. It MUST be the best treatment.
I have to try every combination of problem instance and treatment. I’ll NEVER meet the conference deadline.
I used well-establishedstatistical techniquesand design of experimentsto minimize cost of theexperiments and tomaximize confidence inthe results.
October 21, 2011 C. Purdy--Graduate Seminar--Design of Experiments
6
Experiment: needs a hypothesis
What is a hypothesis?
A hypothesis is an assumption not proved by experiment or observation that is made for the sake of testing its soundness.
--neurolab.isc.nasa.gov/glosseh.htm
October 21, 2011 C. Purdy--Graduate Seminar--Design of Experiments
7
Different approaches to experimentation:
• theoretical: use experiment to try to discover a new “law” or formula or model for a process
• simulation: use experiment to understand how a (complex) system works--must have a model to start with
• design: use experiment to design a new component or system
In all cases, must correctly use the correct experimental tools and methods.
October 21, 2011 C. Purdy--Graduate Seminar--Design of Experiments
8
What is the experimental method?
experimental method - the use of controlled observations and measurements to test hypotheses
exploratory study - preliminary examination of data/treatment space to develop hypotheses which can be tested through experiment
Cohen, Empirical Methods for Artificial Intelligence, MIT Press, 1995
October 21, 2011 C. Purdy--Graduate Seminar--Design of Experiments
9
10 IMPORTANT THINGS TO REMEMBER ABOUT EXPERIMENTS:1. EXPERIMENTS ARE NOT PROOFS.
2. It is just as important to report NEGATIVE results as to report POSITIVE results.
3. IGNORING IMPORTANT FACTORS CAN LEAD TO ERRONEOUS CONCLUSIONS, SOMETIMES WITH TRAGIC RESULTS.
4. YOUR RESULTS ARE ONLY VALID FOR THE PART OF THE DATA-TREATMENT SPACE YOU HAVE EXPLORED.
5. An experiment is worthless unless it can be REPEATED.
6. YOU ONLY GET ANSWERS TO THE QUESTIONS YOU ASK
7. You must use a good (pseudo)RANDOM NUMBER GENERATOR
8. An experiment must be repeated a SUFFICIENT NUMBER OF TIMES for the results to be attributed to more than random error
9. You must choose the CORRECT MEASURE for the question you are asking.
10. Reporting CORRECT results, PROPERLY DISPLAYED, is an integral part of a well-done experiment
October 21, 2011 C. Purdy--Graduate Seminar--Design of Experiments
10
1. EXPERIMENTS ARE NOT PROOFS:
the coelacanth
The coelacanth is a prehistoric fish which thrived about 400 million years ago and was thought to be the ancestor of certain land animals.
Scientists believed that the coelacanth became extinct about 66 million years ago.
Why did they believe this? “Experimental” evidence from the fossil record and the lack of any “newer” specimens.
BUT: in 1938 a live coelacanth was caught near South Africa. Many more specimens have since been caught.
http://www.austmus.gov.au/fishes/fishfacts/fish/coela.htm
October 21, 2011 C. Purdy--Graduate Seminar--Design of Experiments
11
2. It is just as important toreport NEGATIVE results as to reportPOSITIVE results:
Edison and the light bulb
Thomas Edison experimented withthousands of different filaments beforehe finally found one which would glow formany hours without burning up.
http://www.enchantedlearning.com/inventors/edison/lightbulb.shtml
October 21, 2011 C. Purdy--Graduate Seminar--Design of Experiments
12
3. IGNORING IMPORTANT FACTORS CAN LEAD TO ERRONEOUS CONCLUSIONS, SOMETIMES WITH TRAGIC RESULTS:
the Space Shuttle Challenger
On January 28, 1986, the Space Shuttle Challenger exploded during launch, killing its entire crew, including the first “Teacher in Space”.
Eventually, the main cause of the accident was determined to be a failure of the “O-ring” seals on one booster rocket, which did not function well in the extreme cold ( about 36o F, 15o below any previous launch).
“Of 21 launches with ambient temperatures of 61 degrees Fahrenheit or greater, only four showed signs of O-ring thermal distress; i.e., erosion or blow-by and soot. Each of the launches below 61. degrees Fahrenheit resulted in one or more O-rings showing signs of thermal distress.”--Report of the Presidential Commission on the Space Shuttle Challenger Accident, U.S. Government Printing Office : 1986 0 -157-336.)
http://news.bbc.co.uk/onthisday/hi/dates/stories/january/28/newsid_2506000/2506161.stm
October 21, 2011 C. Purdy--Graduate Seminar--Design of Experiments
13
4. YOUR RESULTS ARE ONLY VALID FOR THE PART OF THE DATA-TREATMENT SPACE YOU HAVE EXPLORED:
the Blind Men and the Elephant
...Wall? Spear? Snake?Tree? Rope?
www.plumdigital.com/0_general/blindman.html
October 21, 2011 C. Purdy--Graduate Seminar--Design of Experiments
14
5. An experiment is worthless unless it can be REPEATED:
Cold Fusion
In March 1989, Stanley Pons and Martin Fleischmann, University of Utah, announced they had succeeded in creating a method of “tabletop fusion” which would produce large amounts of cheap, clean energy. “Today the
mainstream view is that champions of cold fusion are little better than purveyors of snake oil and good luck charms.”--http://www.spectrum.ieee.org/WEBONLY/resource/sep04/0904nfus.html\
Current events: can a neutrino travel faster than light?
http://www.earthtech.org/experiments/case/setup.html
October 21, 2011 C. Purdy--Graduate Seminar--Design of Experiments
15
6. YOU ONLY GET ANSWERS TO THE QUESTIONS YOU ASK:
John Snow and the Broad Street map:
What causes cholera?
(Soho, London, 1854)
http://www.winwaed.com/sci/cholera/john_snow.shtml
October 21, 2011 C. Purdy--Graduate Seminar--Design of Experiments
16
7. You must use a good (pseudo) random number generator:
October 21, 2011 C. Purdy--Graduate Seminar--Design of Experiments
17
8. An experiment must be repeated a SUFFICIENT NUMBER OF TIMES for the results to be attributed to more than random error:
Coin Tossing
http://energion.com/books/science/lie_with_statistics.html
October 21, 2011 C. Purdy--Graduate Seminar--Design of Experiments
18
1000000
7217
15569
2314
3666
1279
1700
701
964
MEDIAN: $1700
MODE: $633most frequent
arithmetic average
9. You must choose the CORRECT MEASURE for the question you are asking
Choosing Statistics to Report
World Income Distribution (per Person), 2000 (in 1999 dollars)
After: 1.http://energion.com/books/science/lie_with_statistics.htmlUpdated data: Y. Dikhanov,Trends in World IncomeDistribution, 3rd Forum on HumanDevelopment, Paris, France,Jan.17-19, 2005.2.http://energion.com/books/science/lie_with_statistics.html
400
half above, half below
MEAN: $6533
October 21, 2011 C. Purdy--Graduate Seminar--Design of Experiments
19
10. Reporting CORRECT results, PROPERLY DISPLAYED, is an integral part of a well-done experiment:
www.edwardtufte.com
http://energion.com/books/science/lie_with_statistics.html
October 21, 2011 C. Purdy--Graduate Seminar--Design of Experiments
20
10a. Telling the Whole Story
www.edwardtufte.com
October 21, 2011 C. Purdy--Graduate Seminar--Design of Experiments
21
Procedure:
Define the space
Explore the space
Report the results correctly
Tools for a conscientious experimenter:
--experimental design: allows us to efficiently choose which sets of experiments to run; the choice may not be unique
--statistical techniques: allow us to deal with:--experimental error: measure of precision--distinguishing correlation from causation--complexities of the effects under study (e.g., linearities, etc.)
October 21, 2011 C. Purdy--Graduate Seminar--Design of Experiments
22
Experimental design
A good reference: NIST
http://www.itl.nist.gov/div898/handbook/pri/section3/pri3.htm
Must decide:
--what are your objectives for this experiment? What is your hypothesis
--what are the variables?
--what is the range of each variable (“level”)?
Naïve method: fix all variables but one
Correct method: choose combinations of variable values which will also show effect of interactions
October 21, 2011 C. Purdy--Graduate Seminar--Design of Experiments
23
• Analyzing and Displaying Data– Simple Statistical Analysis– Comparing Results– Curve Fitting
• Statistics for Factorial Designs– 2K Designs Including Replications– Full Factorial Designs– Fractional Factorial Designs
• Ensuring Data Meets Analysis Criteria• Presenting Your Results; Drawing Conclusions
October 21, 2011 C. Purdy--Graduate Seminar--Design of Experiments
24
Important references for this part of the talk:
• Statistical tools– Matlab– The R Project for Statistical Computing: http://www.r-project.org/
• Displaying information– Edward Tufte, The Visual Display of
Quantitative Information, Graphics Press, 2001.
– Edward Tufte, The Cognitive Style of Powerpoint, Graphics Press, 2003.
October 21, 2011 C. Purdy--Graduate Seminar--Design of Experiments
25
Example: A System
System (“Black Box”)
SystemInputs
SystemOutputs
Factors(Experimental Conditions)
Responses(Experimental Results)
October 21, 2011 C. Purdy--Graduate Seminar--Design of Experiments
26
Experimental ResearchDefineSystemDefineSystem
IdentifyFactors
and Levels
IdentifyFactors
and Levels
IdentifyResponse(s)
IdentifyResponse(s)
● Define system outputs first● Then define system inputs● Finally, define behavior (i.e., transfer function)
● Identify system parameters that vary (many)● Reduce parameters to important factors (few)● Identify values (i.e., levels) for each factor
● Identify time, space, etc. effects of interest
DesignExperiments
DesignExperiments
● Identify factor-level experiments
October 21, 2011 C. Purdy--Graduate Seminar--Design of Experiments
27
Create and Execute System; Analyze Data
DefineWorkloadDefine
Workload
CreateSystemCreateSystem
ExecuteSystem
ExecuteSystem
● Workload can be a factor (but often isn't)● Workloads are inputs that are applied to system
● Create system so it can be executed● Real prototype● Simulation model● Empirical equations
● Execute system for each factor-level binding● Collect and archive response data
Analyze &Display
Data
Analyze &Display
Data
● Analyze data according to experiment design● Evaluate raw and analyzed data for errors● Display raw and analyzed data to draw conclusions
October 21, 2011 C. Purdy--Graduate Seminar--Design of Experiments
28
Some ExamplesAnalog Simulation
– Which of three solvers is best?
– What is the system?– Responses
• Fastest simulation time
• Most accurate result• Most robust to types
of circuits being simulated
– Factors• Solver• Type of circuit model• Matrix data structure
Epitaxial growth– New method using non-
linear temp profile
– What is the system?
– Responses
• Total time
• Quality of layer
• Total energy required
• Maximum layer thickness
– Factors
• Temperature profile
• Oxygen density
• Initial temperature
• Ambient temperature
October 21, 2011 C. Purdy--Graduate Seminar--Design of Experiments
29
Basic Descriptive Statistics for a Random Sample X
• Mean• Median• Mode• Variance / standard deviation• Z scores: Z = (X – mean)/ (standard deviation) • Quartiles, box plots• Q-Q plot
Note: these can be deceptive. For example, ifP (X = 0) = P(X = 100) = 0.5 and P (Y = 50 ) = 1,Then X and Y have the same mean (and nastier examples can be constructed)
home.oise.utoronto.ca/~thollenstein/Exploratory%20Data%20Analysis.ppt
October 21, 2011 C. Purdy--Graduate Seminar--Design of Experiments
30
Basic Descriptive Statistics for a Random Sample X: Instructive Example
Four sets of data with the same basic descriptive statisticsAfter F.J. Anscombe, 1973
Tufte, The Visual Display of Quantitative Information, 1983
October 21, 2011 C. Purdy--Graduate Seminar--Design of Experiments
31
Basic Descriptive Statistics for a Random Sample X
Graphs of Anscombe’s dataTufte, The Visual Display of Quantitative Information, 1983
October 21, 2011 C. Purdy--Graduate Seminar--Design of Experiments
32
SIMPLE MODELS OF DATA
Ms. # Latency1 222 233 194 185 156 207 268 179 1910 17
Data file “latency.dat”
Example 1: Evaluation of a new wireless network protocol What is the distribution of the latency per message?System: wireless network with new protocol Workload:
10 messages applied at single sourceEach message identical configuration
Experiment output:Roundtrip latency per message (ms)
Mean: 19.6 msVariance: 10.71 ms2
Std Dev: 3.27 ms
Hypothesis: Distribution is N(2)
October 21, 2011 C. Purdy--Graduate Seminar--Design of Experiments
33
Verify Model PreconditionsCheck randomness
Use plot of residuals around meanResiduals “appear” random
Check normal distribution Use quantile-quantile (Q-Q) plot
Pattern adheres consistently alongideal quantile-quantile line
http://itl.nist.gov/div898/software/dataplot/refman1/ch2/quantile.pdf
October 21, 2011 C. Purdy--Graduate Seminar--Design of Experiments
34
Confidence Intervals
)/,/( ]2/1[]2/1[ nszxnszx aa
)/,/ ]1;2/1[]1;2/1[ nstxnstx nana
Sample mean vs Population meanIf many samples are collected, about 1 - will contain the“true mean”
CI: > 30 samples
CI: < 30 samples
For the latency data, = 19.6, a = 0.05:
(17.26, 21.94)
Raj Jain, “The Art of Computer Systems Performance Analysis,” Wiley, 1991.
October 21, 2011 C. Purdy--Graduate Seminar--Design of Experiments
35
Scatter and Line PlotsDepth Resistance
1 1.6890152 4.4867223 7.9152094 6.3623885 11.8307396 12.3291047 14.0113968 17.6000949 19.02214610 21.513802
Example 2: Relation between two variables: Resistance profile of doped silicon epitaxial layer
Expect linear resistance increase as depth increases
October 21, 2011 C. Purdy--Graduate Seminar--Design of Experiments
36
Linear Regression Statistics(hypothesis: resistance = 0 + 1*depth + error)
model = lm(Resistance ~ Depth)summary(model)
Residuals: Min 1Q Median 3Q Max-2.11330 -0.40679 0.05759 0.51211 1.57310 Coefficients: Estimate Std. Error t value Pr(>|t|)(Intercept) -0.05863 0.76366 -0.077 0.94Depth 2.13358 0.12308 17.336 1.25e-07 ***---Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: 1.118 on 8 degrees of freedom “variance of error: (1.118)2”Multiple R-Squared: 0.9741, Adjusted R-squared: 0.9708 F-statistic: 300.5 on 1 and 8 DF, p-value: 1.249e-07 “evidence this estimate valid” (“prob. It occurred by chance”)
“reject hypotheses 0 = 0, 1 = 0”
(Using R system; based on http://www.stat.umn.edu/geyer/5102/examp/reg.html
October 21, 2011 C. Purdy--Graduate Seminar--Design of Experiments
37
Validating ResidualsErrors are marginally normally distributed due to “tails”
October 21, 2011 C. Purdy--Graduate Seminar--Design of Experiments
38
Comparing Two Sets of DataExample 3: Consider two different wireless access points. Which one is faster?
Inputs: same set of 10 messages communicated through both access points.
Response (usecs):
Latency1 Latency2 22 19 23 20 19 24 18 20 15 14 20 18 26 21 17 17 19 17
17 18
Approach: Take difference of data
and determine CI of difference.
If CI straddles zero, cannot tell which access point is faster.
CI95% = (-1.27, 2.87) usecs
Confidence interval straddles zero. Thus, cannot determine which is faster with 95% confidence
October 21, 2011 C. Purdy--Graduate Seminar--Design of Experiments
39
Curve fitting & Plots with error bars
Example 4: Execution time of SuperLU linear system solution on parallel computer
Ax = b For each p, ran problem
multiple times with same matrix size but different values Determined mean and CI
for each p to obtain curve and error intervals
Matrix density p
October 21, 2011 C. Purdy--Graduate Seminar--Design of Experiments
40
Curve Fitting> model = lm(t ~ poly(p,4))> summary(model) Call:lm(formula = t ~ poly(p, 4)) Residuals: 1 2 3 4 5 6 7 8 9-0.4072 0.7790 0.5840 -1.3090 -0.9755 0.8501 2.6749 -3.1528 0.9564 Coefficients: Estimate Std. Error t value Pr(>|t|)(Intercept) 236.9444 0.7908 299.636 7.44e-10 ***poly(p, 4)1 679.5924 2.3723 286.467 8.91e-10 ***poly(p, 4)2 268.3677 2.3723 113.124 3.66e-08 ***poly(p, 4)3 42.8772 2.3723 18.074 5.51e-05 ***poly(p, 4)4 2.4249 2.3723 1.022 0.364---Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: 2.372 on 4 degrees of freedomMultiple R-Squared: 1, Adjusted R-squared: 0.9999F-statistic: 2.38e+04 on 4 and 4 DF, p-value: 5.297e-09
October 21, 2011 C. Purdy--Graduate Seminar--Design of Experiments
41
Example 5: Model Validation: y’ = ax + bR2 – Coefficient of Determination
“How well does the data fit your model?”What proportion of the “variability” is accounted for by the statistical model? (what is ratio of explained variation to total variation?)
Suppose we have measurements y1, y2, …, yn with mean m
And predicted values y1’, y2’, …, yn’ (yi’ = axi + b = yi + ei)
SSE = sum of squared errors = ∑ (yi – yi’)2 = ∑ei2
SST = total sum of squares =∑ (yi – m)2
SSR = SST – SSE = residual sum of squares = ∑ (m – yi’)2
R2 = SSR/SST = (SST-SSE)/SSTR2 is a measure of how good the model is.The closer R2 is to 1 the better.
Example: Let SST = 1499 and SSE = 97.
Then R2 = 93.5%http://www-stat.stanford.edu/~jtaylo/courses/stats191/notes/simple_diagnostics.pdf
October 21, 2011 C. Purdy--Graduate Seminar--Design of Experiments
42
Example 6: Using the t-test to compare 2 means
extra group1 0.7 12 -1.6 13 -0.2 14 -1.2 15 -0.1 16 3.4 17 3.7 18 0.8 19 0.0 110 2.0 111 1.9 212 0.8 213 1.1 214 0.1 215 -0.1 216 4.4 217 5.5 218 1.6 219 4.6 220 3.4 2
Consider the following data (“sleep.R”)
From “Introduction to R”, http://www.R-project.org
October 21, 2011 C. Purdy--Graduate Seminar--Design of Experiments
43
T.test result
> t.test(extra ~ group, data = sleep)
Welch Two Sample t-test data: extra by groupt = -1.8608, df = 17.776, p-value = 0.0794 alternative hypothesis: true difference in means is not equal to 095 percent confidence interval: -3.3654832 0.2054832sample estimates:mean of x mean of y 0.75 2.33
p-value is smallest 1- confidence where null hypothesis. not true. p-value = 0.0794 means difference not 0
above 92%
October 21, 2011 C. Purdy--Graduate Seminar--Design of Experiments
44
Factorial Design—Another Example
What “factors” need to be taken into account?How do we design an efficient experiment to test all these factors?How much do the factors and the interactions among the factors contribute to the variation in results?
Example: 3 factors a,b,c, each with 2 values: 8 combinations
But what if we want random order of experiments?What if each of a,b,c has 3 values?Do we need to run all experiments?
http://www.itl.nist.gov/div898/handbook/pri/section3/pri3332.htm
October 21, 2011 C. Purdy--Graduate Seminar--Design of Experiments
45
Standard Procedure-Full Factorial Design
(Example) Variables A,B,C: each with 3 values, Low, Medium, High (coded as -1,0,1)“Signs Table”:
A B C
1 -1 -1 -1
2 +1 -1 -1
3 -1 +1 -1
4 +1 +1 -1
5 -1 -1 +1
6 +1 -1 +1
7 -1 +1 +1
8 +1 +1 +1
1.Run the experiments in the table (“2 level, full factorial design”)
2.Repeat the experiments in this order n times by using rows 1,…,8,1,…,8, … (“replication”)
3.Use step 2, but choose the rows randomly (“randomization”)
4.Use step 4, but add some “center point runs”, for example, run the case 0,0,0, then use 8 rows, then run 0,0,0, …finish with a 0,0,0 case
In general, for 5 or more factors, use a “fractional factorial design”
October 21, 2011 C. Purdy--Graduate Seminar--Design of Experiments
46
2k Factorial DesignExample: k = 2, factors are A,B, and X’s are computed from the signs table: y = q
0 + q
Ax
A + q
Bx
B + q
ABx
AB
SST = total variation around the mean = ∑ (y
i – mean)2
= SSA+SSB+SSAB where SSA = 22q
A2 (variation allocated to A), and SSB and
SSAB are defined similarly
Note: var(y) = SST/( 2k – 1)
Fraction of variation explained by A = SSA/SST
A B
1 -1 -1
2 +1 -1
3 -1 +1
4 +1 +1
October 21, 2011 C. Purdy--Graduate Seminar--Design of Experiments
47
Example 7: 2k Design
Are all factors needed?If a factor has little effect on the variability of the output, why study it further?
Method? a. Evaluate variation for each factor using only two levels each b. Must consider interactions as well
Interaction: effect of a factor dependent on the levels of another
L K C Misses 32 4 mux 512 4 mux 32 16 mux 512 16 mux 32 4 lin 512 4 lin 32 16 lin 512 16 lin
Factor LevelsLine Length (L) 32, 512 wordsNo. Sections (K) 4, 16 sectionsControl Method (C) multiplexed, linear
Experiment Design
CacheAddress Trace
Misses
L K C Misses -1 -1 -1 1 -1 -1 -1 1 -1 1 1 -1 -1 -1 1 1 -1 1 -1 1 1 1 1 1
Encoded Experiment Design
www.stat.nuk.edu.tw/Ray-Bing/ex-design/ex-design/ExChapter6.ppt
October 21, 2011 C. Purdy--Graduate Seminar--Design of Experiments
48
I L K C LK LC KC LKC Miss.Rate (yj) 1 -1 -1 -1 1 1 1 -1 14 1 1 -1 -1 -1 -1 1 1 22 1 -1 1 -1 -1 1 -1 1 10 1 1 1 -1 1 -1 -1 -1 34 1 -1 -1 1 1 -1 -1 1 46 1 1 -1 1 -1 1 -1 -1 58 1 -1 1 1 -1 -1 1 -1 50 1 1 1 1 1 1 1 1 86
Analyze Results (Sign Table)
qi: 40 10 5 20 5 2 3 1
= 1/∑(signi*Response
i)
SSL = 23q2L = 800
SST = SSL+SSK+SSC+SSLK+SSLC+SSKC+SSLKC = 800+200+3200+200+32+72+8 = 4512
%variation(L) = SSL/SST = 800/4512 = 17.7%
Effect % VariationL 17.7C 4.4K 70.9
LC 4.4LK 0.7CK 1.6
LCK 0.2
L K C Misses -1 -1 -1 14 1 -1 -1 22 -1 1 -1 10 1 1 -1 34 -1 -1 1 46 1 -1 1 58 -1 1 1 50 1 1 1 86
Obtain Reponses
Example: 2k Design (continued)
Ex: y1 = 14 = q0 – qL –qK –qC
+ qLK + qLC + qKC – qLKC
Solve for q’s
http://www.cs.wustl.edu/~jain/cse567-06/ftp/k_172kd/sld001.htm
October 21, 2011 C. Purdy--Graduate Seminar--Design of Experiments
49
Full Factorial DesignModel: y
ij = m+a
i + b
j + e
ij
Effects computed such that ∑ai = 0 and ∑b
j = 0
m = mean(y..)ai = mean(y.j) – mbi = mean(yi.) – m
Experimental ErrorsSSE = ei
2j
SS0 = abm2
SSA= b∑a2
SSB= a∑b2
SST = SS0+SSA+SSB+SSE
October 21, 2011 C. Purdy--Graduate Seminar--Design of Experiments
50
Example 8: Full-Factorial Design Example
Determination of the speed of light
Morley Experiments
Factors: Experiment No. (Expt) Run No. (Run)
Levels: Expt – 5 experiments Run – 20 repeated runs
Expt Run Speed001 1 1 850002 1 2 740003 1 3 900004 1 4 1070 <more data>019 1 19 960020 1 20 960021 2 1 960022 2 2 940023 2 3 960 <more data>096 5 16 940097 5 17 950098 5 18 800099 5 19 810100 5 20 870
October 21, 2011 C. Purdy--Graduate Seminar--Design of Experiments
52
Two-Factor Full Factorial
> fm <- aov(Speed~Run+Expt, data=mm) # Determine ANOVA> summary(fm) # Display ANOVA of factors Df Sum Sq Mean Sq F value Pr(>F)Run 19 113344 5965 1.1053 0.363209Expt 4 94514 23629 4.3781 0.003071 **Residuals 76 410166 5397---Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
Conclusion: Data across experiments has acceptably small variation, but variation within runs is significant
October 21, 2011 C. Purdy--Graduate Seminar--Design of Experiments
53
What if there are more factors?
Total number of experiments = #levels#factors
What if there are 3 levels and 6 factors? 36 = 729 runs
If we use replication, there are even more runs
Computer experiments: not such a problem, computer is doing the work
Lab experiments:time, materials, technicians’ salaries—can add up
October 21, 2011 C. Purdy--Graduate Seminar--Design of Experiments
54
An alternative: fractional factorial design
Example: 23-1
From the entries in the table we are able to compute all `effects' such as main effects, first-order `interaction' effects, etc.
For example, to compute the main effect estimate `c1' of factor X1, we compute the average response at all runs with X1 at the `high' setting, namely (1/4)(y2 + y4 + y6 + y8), minus the average response of all runs with X1 set at `low,' namely (1/4)(y1 + y3 + y5 + y7). That is, c1 = (1/4) (y2 + y4 + y6 + y8) -- (1/4)(y1 + y3 + y5 + y7) = (1/4)(63+57+51+53 ) – (1/4)(33+41+57+59) = 8.5
TABLE 3.11 A 23 Two-level, Full Factorial Design Table Showing Runs in `Standard
Order,' Plus Observations (yj)
X1 X2 X3 Y
1 -1 -1 -1 y1 = 33
2 +1 -1 -1 y2 = 63
3 -1 +1 -1 y3 = 41
4 +1 +1 -1 Y4 = 57
5 -1 -1 +1 y5 = 57
6 +1 -1 +1 y6 = 51
7 -1 +1 +1 y7 = 59
8 +1 +1 +1 y8 = 53
October 21, 2011 C. Purdy--Graduate Seminar--Design of Experiments
55
We computed c1 = 8.5
Suppose, however, that we only have enough resources to do four runs. Is itstill possible to estimate the main effect for X1? Or any other main effect? The answer is yes, and there are even different choices of the four runs that will accomplish this. For example, suppose we select only the four light (unshaded) corners of the design cube. Using these four runs (1, 4, 6 and 7), we can still compute c1 as follows:
c1 = (1/2) (y4 + y6) - (1/2) (y1 + y7) = (1/2) (57+51) - (1/2) (33+59) = 8.
Similarly, we would compute c2, the effect due to X2, as
c2 = (1/2) (y4 + y7) - (1/2) (y1 + y6) = (1/2) (57+59) - (1/2) (33+51) = 16.
Finally, the computation of c3 for the effect due to X3 would be
c3 = (1/2) (y6 + y7) - (1/2) (y1 + y4) = (1/2) (51+59) - (1/2) (33+57) = 10.
October 21, 2011 C. Purdy--Graduate Seminar--Design of Experiments
56
We could also have used the four dark (shaded) corners of the design cube for our runs and obtained similiar, but slightly different, estimates for the main effects. In either case, we would have used half the number of runs that the full factorial requires. The half fraction we used is a new design written as 23-1.
Note that 23-1 = 23/2 = 22 = 4, which is the number of runs in this half-fraction design.
October 21, 2011 C. Purdy--Graduate Seminar--Design of Experiments
57
Constructing the 23-1 half-fraction design (example)
We start with Table I.We need to add a third column. We do it by adding the X1*X2 interaction column to get Table II.We may now substitute `X3' in place of `X1*X2' to get Table III, which amounts to theDark-shaded corners.
If we had set X3 = -X1*X2 as the rule for generating the third column of our 23-1 design, we would have obtained Table IV, the light-shaded corners.
TABLE I A Standard Order 22 Full Factorial Design
Table
X1 X2
1 -1 -1
2 +1 -1
3 -1 +1
4 +1 +1
TABLE II A 22 Design Table Augmented
with the X1*X2 Interaction Column
`X1*X2'
X1 X2 X1*X2
1 -1 -1 +1
2 +1 -1 -1
3 -1 +1 -1
4 +1 +1 +1
TABLE III A 23-1 Design Table with Column X3 set to
X1*X2
X1 X2 X3
1 -1 -1 +1
2 +1 -1 -1
3 -1 +1 -1
4 +1 +1 +1
TABLE IV A 23-1 Design Table with Column X3 set to
X1*X2
X1 X2 X3
1 -1 -1 +1
2 +1 -1 -1
3 -1 +1 -1
4 +1 +1 +1
October 21, 2011 C. Purdy--Graduate Seminar--Design of Experiments
58
Confounding and Sparsity of Effects
Confounding means we have lost the ability to estimate some effects and/or interactions
One price we pay for using the design table column X1*X2 to obtain column X3 is our inability to obtain an estimate of the interaction effect for X1*X2 (i.e., c12) that is separate from an estimate of the main effect for X3.
In other words, we have confounded the main effect estimate for factor X3 (i.e., c3) with the estimate of the interaction effect for X1 and X2 (i.e., with c12). The whole issue of confounding is fundamental to the construction of fractional factorial designs.
Sparsity of effects assumption In using the 23-1 design, we also assume that c12 is small compared to c3; this is called a `sparsity of effects' assumption. Our computation of c3 is in fact a computation of c3 + c12. If the desired effects are only confounded with non-significant interactions, then we are OK. NOTE: THIS MEANS YOU NEED GOOD UNDERSTANDING OF YOUR DATA AND OF THE PROBLEM YOU ARE TRYING TO SOLVE!
Note: we can define general procedure to construct valid fractional designs.
October 21, 2011 C. Purdy--Graduate Seminar--Design of Experiments
59
Visualizing Results: Tufte’s Principles
• Have a properly chosen format and design
• Use words, numbers, and drawing together
• Reflect a balance, a proportion, a sense of relevant scale
• Display an accessible complexity of detail
• Have a story to tell about the data
• Draw in a professional manner
• Avoid content-free decoration, including “chart junk”
October 21, 2011 C. Purdy--Graduate Seminar--Design of Experiments
60
Presenting Your Results: Dilbert on Powerpoint (PPt)
Now, about Powerpoint© presentations…….
http://www.dilbert.com/
October 21, 2011 C. Purdy--Graduate Seminar--Design of Experiments
61
A picture is not always worth 1,000 words….
http://www.dilbert.com/
October 21, 2011 C. Purdy--Graduate Seminar--Design of Experiments
62
And it is easy to get carried away by enthusiasm for your subject……
http://www.dilbert.com/
October 21, 2011 C. Purdy--Graduate Seminar--Design of Experiments
63
Presenting Your Results: Tufte on Powerpoint (PPt)****PPt REDUCES THE ANALYTICAL QUALITY of serious presentations of evidence****this is especially true for PPt ready-made templates, which CORRUPT STATISTICAL REASONING, and often WEAKEN VERBAL AND SPATIAL THINKING****statistical graphics produced by PPt are astonishingly thin, NEARLY CONTENT-FREE****for words, impoverished space encourages IMPRECISE STATEMENTS, SLOGANS, ABRUPT AND THINLY-ARGUED CLAIMSPPt suffers from NARROW BANDWIDTH & RELENTLESS SEQUENCINGaudience members need at least one mode of information that ALLOWS THEM TO CONTROL THE ORDER AND PACE OF LEARNINGex: Columbia spacecraft report (made while it was still in the air): bullets and outline format obscured the important points about the problem with the tiles (2nd disaster)
October 21, 2011 C. Purdy--Graduate Seminar--Design of Experiments
64
Visualizing Results: Tufte’s Principles Applied to PPt
• Have a properly chosen format and design
• Use words, numbers, and drawing together
• Reflect a balance, a proportion, a sense of relevant scale
• Display an accessible complexity of detail
• Have a story to tell about the data
• Draw in a professional manner
• Avoid content-free decoration, including “chart junk”
• Don’t use PPt gimmicks such as line-by-line sequencing
• Provide nonsequential medium in addition to PPt
Since there aren’t really any good alternatives,…….