Upload
winifred-gibbs
View
222
Download
0
Tags:
Embed Size (px)
Citation preview
www.lisa.stat.vt.edu
Laboratory for Interdisciplinary Statistical Analysis
1948: The Statistical Laboratory was founded as a division of the Virginia Agricultural Experiment Station to
help agronomists design experiments and calculate sums of squares.
www.lisa.stat.vt.edu
Laboratory for Interdisciplinary Statistical Analysis
1949: Based on the success of the Statistical Laboratory, the Department of Statistics at Virginia Polytechnic
Institute (VPI) was founded—the 3rd oldest statistics department in the United States.
www.lisa.stat.vt.edu
Laboratory for Interdisciplinary Statistical Analysis
1973: The Statistical Laboratory was re-formed as the Statistical Consulting Center to assist with statistical
analyses in every college of Virginia Polytechnic Institute & State University (VPI&SU).
www.lisa.stat.vt.edu
Laboratory for Interdisciplinary Statistical Analysis
2007: The Graduate Student Assembly led a movement to save statistical consulting and collaboration from
death by budget cuts, ensuring that graduate students could receive help with their research.
The College of Science, Provost, Vice President of Research, Graduate School, and six additional colleges agreed that researchers should be able to receive free
statistical consulting and collaboration.
www.lisa.stat.vt.edu
Laboratory for Interdisciplinary Statistical Analysis
2008: The Statistical Consulting Center was re-organized as the Laboratory for Interdisciplinary
Statistical Analysis (LISA) to collaborate with researchers across the Virginia Tech (VT) campuses.
www.lisa.stat.vt.edu
Laboratory for Interdisciplinary Statistical Analysis
Established in 2008
Year Clients Hours
2000 299 13682001 293 19382002 321 22202003 304 21922004 274 17752005 211 4952006 171 5412007 190 9652008 895 21842009 719 30932010 1124 4420
www.lisa.stat.vt.edu
Laboratory for Interdisciplinary Statistical Analysis
Year Clients Hours
2000 299 13682001 293 19382002 321 22202003 304 21922004 274 17752005 211 4952006 171 5412007 190 9652008 895 21842009 719 30932010 1124 4420
Year
Clie
nts
pe
r ye
ar
2000 2002 2004 2006 2008 2010
03
00
60
09
00
12
00
www.lisa.stat.vt.edu
Laboratory for Interdisciplinary Statistical Analysis
Year Clients Hours
2000 299 13682001 293 19382002 321 22202003 304 21922004 274 17752005 211 4952006 171 5412007 190 9652008 895 21842009 719 30932010 1124 4420
Year
Ho
urs
pe
r ye
ar
2000 2002 2004 2006 2008 2010
01
00
02
00
03
00
04
00
05
00
0
Laboratory for Interdisciplinary Statistical Analysis
www.lisa.stat.vt.edu
Laboratory for Interdisciplinary Statistical Analysis
LISA helps VT researchers benefit
from the use of Statistics
www.lisa.stat.vt.edu
Experimental Design • Data Analysis • Interpreting ResultsGrant Proposals • Software (R, SAS, JMP, SPSS...)
Our goal is to improve the quality of research and the use of statistics at
Virginia Tech.
10
Laboratory for Interdisciplinary Statistical Analysis
www.lisa.stat.vt.edu
Laboratory for Interdisciplinary Statistical Analysis
LISA helps VT researchers benefit
from the use of Statistics
www.lisa.stat.vt.edu
Collaboration LISA statisticians meet with faculty, staff, and graduate students to
understand their research and think of
ways to help them using statistics.
11
Laboratory for Interdisciplinary Statistical Analysis
www.lisa.stat.vt.edu
Laboratory for Interdisciplinary Statistical Analysis
Collaboration
LISA helps VT researchers benefit
from the use of Statistics
www.lisa.stat.vt.edu
Walk-In Consulting
Every day from 1-3PMclients get answers to their (quick) questions
about using statistics in their research.
12
Laboratory for Interdisciplinary Statistical Analysis
www.lisa.stat.vt.edu
Laboratory for Interdisciplinary Statistical Analysis
LISA helps VT researchers benefit
from the use of Statistics
www.lisa.stat.vt.edu
Walk-In Consulting
Collaboration
Short Courses
Short Courses are designed to teach graduate students
howto apply statisticsin their research.
13
Laboratory for Interdisciplinary Statistical Analysis
www.lisa.stat.vt.edu
Laboratory for Interdisciplinary Statistical Analysis
Short Courses
LISA helps VT researchers benefit
from the use of Statistics
www.lisa.stat.vt.edu
Walk-In Consulting
Collaboration
All services are FREE for VT researchers. We assist with research—not class projects or homework.
14
How can LISA help?• Formulate research question.• Screen data for integrity and unusual
observations.• Implement graphical techniques to showcase
the data – what is the story?• Develop and implement an analysis plan to
address research question.• Help interpret results.• Communicate! Help with writing the report or
giving the talk.
• Identify future research directions.
www.lisa.stat.vt.edu
Laboratory for Interdisciplinary Statistical Analysis
To request a collaboration meeting go to
www.lisa.stat.vt.edu
www.lisa.stat.vt.edu
Laboratory for Interdisciplinary Statistical Analysis
To request a collaboration meeting go to www.lisa.stat.vt.edu
1. Sign in to the website using your VT PID and password.2. Enter your information (email address, college, etc.)3. Describe your project (project title, research goals,
specific research questions, if you have already collected data, special requests, etc.) 4. Wait 0-3 days, then contact the LISA collaboratorsassigned to your project to schedule an initial meeting.
www.lisa.stat.vt.edu
Laboratory for Interdisciplinary Statistical Analysis
www.lisa.stat.vt.edu
Laboratory for Interdisciplinary Statistical Analysis
Introduction to R• R is a free software environment for
statistical computing and graphics. Download: http://www.r-project.org/
• Topics Covered:
• Data objects in R, loops, import/export datasets, data manipulation
• Graphing
• Basic Analyses: T-tests, Regression, ANOVA
www.lisa.stat.vt.edu
Laboratory for Interdisciplinary Statistical Analysis
Linear Regression & Structural Equation Monitoring• Linear regression is used to model the
relationship between a continuous response and a continuous predictor.
• SEM is a modeling technique that investigates causal relationships among variables.
• Time –related latent variables, modification indices and critical ratio in exploratory analyses, and computation of implied moments, factor score weights, total effects, and indirect effects.
www.lisa.stat.vt.edu
Laboratory for Interdisciplinary Statistical Analysis
Generalized Linear Models
• Modeling technique for situations where the errors are not necessarily normal.
• Can handle situations where you have binary responses, counts, etc.
• Uses a link function to relate the response to the linear model.
• Cover: Basic statistical concepts of GLM and how it relates to regression using normal errors.
www.lisa.stat.vt.edu
Laboratory for Interdisciplinary Statistical Analysis
Mixed Models and Random Effects• Mixed Model: A statistical model that has
both random effects and fixed effects.
• Fixed Effect: Levels of the factor are predetermined. Random Effect: Levels of the factor were chosen at random.
• The primary focus of the course will be to identify scenarios where a mixed model approach will be appropriate. The concepts will be explained almost wholly through examples in SAS or in R.
T-Tests and Analysis of Variance
Anne Ryan
23
Defense:
Prosecution:
What’s the Assumed Conclusion?
Criminal Trial
Represent the accused (defendant)
Hold the “Burden of Proof”—obligation to shift the assumed conclusion from an oppositional opinion to one’s own position through evidence
ANSWER: The accused is innocent until proven guilty.• Prosecution must convince the judge/jury that
the defendant is guilty beyond a reasonable doubt
24
Similarities between Criminal Trials and Hypothesis Testing
Burden of Proof—Obligation to shift the conclusion using evidence
TrialHypothesis Test
Innocent until proven guilty
Accept the status quo (what is
believed before) until the data
suggests otherwise
25
Similarities between Criminal Trials and Hypothesis Testing
Decision Criteria
TrialHypothesis Test
Evidence has to convincing beyond a
reasonable
Occurs by chance less than 100α% of the time (ex:
5%)
26
Hypothesis Test: Procedure for examining a claim about the value of a parameter◦ i.e.
Hypothesis tests are very methodical with several key pieces.
Introduction to Hypothesis Testing
27
1. Test
2. Assumptions
3. Hypotheses
4. Mechanics
5. Conclusion
Steps in a Hypothesis Test
28
State the name of the testing method to be used
It is important to not be off track in the very beginning
Hypothesis Tests we will Perform:◦ One Sample t test for μ◦ Two sample t test for μ◦ Paired t test ◦ ANOVA
1. Test
29
List all the assumptions required for your test to be valid.
All tests have assumptions
Even if assumptions are not met you should still comment on how this affects your results.
2. Assumptions
30
State the hypothesis of interest
There are two hypotheses◦ Null Hypothesis: Denoted ◦ Alternative Hypothesis: Denoted
Examples of possible hypotheses:
3. Hypotheses
0HaHorH1
13:.13:0 aHvsH
31
For hypothesis testing there are three popular versions of testing◦ Left Tailed Hypothesis Test◦ Right Tailed Hypothesis Test◦ Two Tailed or Two Sided Hypothesis Test
3. Hypotheses Continued
32
1. Left Tailed Hypothesis Test: Researchers are only interested in whether
the true value is below the hypothesized value.
e.g—
2. Right Tailed Hypothesis Test: Researchers are only interested in
whether the True Value is above the hypothesized value.
e.g.–
3. Hypotheses Continued
000 :.: aHvsH
33
3. Two Tailed or Two Sided Hypothesis Test: The researcher is interested in looking above and below they hypothesized value.
3. Hypotheses Continued
000 :.: aHvsH
34
Three Requirements for Stating Hypotheses:1. Two complementary hypotheses.
2. A parameter about which the test is to be based e.g.—μ
3. Hypothesized Value for parameter
Denoted but generally takes on numeric values in practice
3. Hypotheses Continued
andorand
35
Computational Part of the Test
What is part of the Mechanics step?◦ Stating the Significance Level◦ Finding the Rejection Rule◦ Computing the Test Statistic◦ Computing the p-value
4. Mechanics
36
Significance Level: Here we choose a value to use as the significance level, which is the level at which we are willing to start rejecting the null hypothesis.
Denoted by α
Default value is α=.05, use α=.05 unless otherwise noted!
4. Mechanics Continued
37
Rejection Rule: State our criteria for rejecting the null hypothesis.◦ “Reject the null hypothesis if p-value<.05”.
p-value: The probability of obtaining a point estimate as “extreme” as the current value where the definition of “extreme” is taken from the alternative hypotheses assuming the null hypothesis is true.
4. Mechanics Continued
38
Test Statistic: Compute the test statistic, which is usually a standardization of your point estimate.
Translates your point estimate, a statistic, to follow a known distribution so that is can be used for a test.
4. Mechanics Continued
39
p-value: After computing the test statistic, now you can compute the p-value.
Use software to compute p-values.
4. Mechanics Continued
40
Conclusion: Last step of the hypothesis test just like it is the last step when computing confidence intervals.
Conclusions should always include:◦ Decision: reject or fail to reject◦ Linkage: why you made the decision (interpret p-
value)◦ Context: what your decision means in context of
the problem.
5. Conclusion
41
Note: Your decision can only be one of two choices:
1. Reject --data gives strong indication that is more likely
2. Fail to Reject --data gives no strong indication that is more likely
When conducting hypothesis tests, we assume that is true, therefore the decision CAN NOT be to accept the null hypothesis
5. Conclusion
0HaH
0H
aH
0H
42
One Sample T-Test
43
One Sample T-Test Used to test whether the population mean is
different from a specified value.
Example: Is the mean height of 12 year old girls greater than 60 inches?
http://office.microsoft.com/en-us/images
44
Step 1: Formulate the Hypotheses
The population mean is not equal to a specified value.Null Hypothesis, H0: μ = μ0
Alternative Hypothesis: Ha: μ ≠ μ0
The population mean is greater than a specified value. H0: μ = μ0
Ha: μ > μ0
The population mean is less than a specified value.H0: μ = μ0
Ha: μ < μ0
45
Step 2: Check the Assumptions The sample is random.
The population from which the sample is drawn is either normal or the sample size is large.
46
Steps 3-5 Step 3: Calculate the test statistic:
Where
Step 4: Calculate the p-value based on the appropriate alternative hypothesis.
Step 5: Write a conclusion.
ns
yt
/0
11
2
n
yys
n
ii
47
Iris Example A researcher would like to know whether the mean sepal
width of a variety of irises is different from 3.5 cm. Use .
The researcher randomly selects 50 irises and measures the sepal width.
Step 1: HypothesesH0: μ = 3.5 cm
Ha: μ ≠ 3.5 cm
http://en.wikipedia.org/wiki/Iris_flower_data_set
48
JMP Steps 2-4:
JMP DemonstrationAnalyze DistributionY, Columns: Sepal Width
Normal Quantile Plot
Test MeanSpecify Hypothesized Mean: 3.5
49
JMP Output
Step 5 Conclusion: Fail to reject since the p-value=0.1854 is greater than 0.05. There is significant sample evidence to indicate that the mean sepal width is not different from 3.5 cm.
50
Two Sample T-Test
51
Two Sample T-Test Two sample t-tests are used to determine
whether the population mean of one group is equal to, larger than or smaller than the population mean of another group.
Example: Is the mean cholesterol of people taking drug A lower than the mean cholesterol of people taking drug B?
52
Step 1: Formulate the Hypotheses The population means of the two groups are not equal.
H0: μ1 = μ2
Ha: μ1 ≠ μ2
The population mean of group 1 is greater than the population mean of group 2.H0: μ1 = μ2
Ha: μ1 > μ2
The population mean of group 1 is less than the population mean of group 2.H0: μ1 = μ2
Ha: μ1 < μ2
53
Step 2: Check the Assumptions The two samples are random and
independent.
The populations from which the samples are drawn are either normal or the sample sizes are large.
The populations have the same standard deviation.
54
Steps 3-5 Step 3: Calculate the test statistic
where
Step 4: Calculate the appropriate p-value. Step 5: Write a Conclusion.
21
21
11
nns
yyt
p
2
)1()1(
21
222
211
nn
snsnsp
55
Two Sample Example A researcher would like to know whether the
mean sepal width of setosa irises is different from the mean sepal width of versicolor irises.
The researcher randomly selects 50 setosa irises and 50 versicolor irises and measures their sepal widths.
Step 1 Hypotheses:H0: μsetosa = μversicolor
Ha: μsetosa ≠ μversicolorhttp://en.wikipedia.org/wiki/Iris_flower_data_set
http://en.wikipedia.org/wiki/Iris_versicolor
56
JMP Steps 2-4:
JMP Demonstration:Analyze Fit Y By XY, Response: Sepal WidthX, Factor: Species
Means/ANOVA/Pooled t
Normal Quantile Plot Plot Actual by Quantile
57
JMP Output
Step 5 Conclusion: There is strong evidence (p-value < 0.0001) that the mean sepal widths for the two varieties are different.
setosa
versicolor
-2.33 -1.64-1.28 -0.67 0.0 0.67 1.281.64 2.33
0.5
0.8
0.9
0.2
0.1
0.0
2
0.9
8
Normal Quantile
58
Paired T-Test
59
Paired T-Test The paired t-test is used to compare the
population means of two groups when the samples are dependent.
Example:A researcher would like to determine if background noise causes people to take longer to complete math problems. The researcher gives 20 subjects two math tests one with complete silence and one with background noise and records the time each subject takes to complete each test.
60
Step 1: Formulate the Hypotheses
The population mean difference is not equal to zero. H0: μdifference = 0
Ha: μdifference ≠ 0 The population mean difference is greater than
zero. H0: μdifference = 0
Ha: μdifference > 0 The population mean difference is less than a zero.
H0: μdifference = 0
Ha: μdifference < 0
61
Step 2: Check the assumptions The sample is random.
The data is matched pairs.
The differences have a normal distribution or the sample size is large.
62
Steps 3-5
ns
dt
d /
0
Where d bar is the mean of the differences and sd is the standard deviations of the differences.
Step 4: Calculate the p-value.
Step 5: Write a conclusion.
Step 3: Calculate the test Statistic:
63
Paired T-Test Example A researcher would like to determine
whether a fitness program increases flexibility. The researcher measures the flexibility (in inches) of 12 randomly selected participants before and after the fitness program.
Step 1: Formulate a HypothesisH0: μAfter - Before = 0
Ha: μ After - Before > 0http://office.microsoft.com/en-us/images
64
Paired T-Test Example Steps 2-4:
JMP Analysis:Create a new column of After – BeforeAnalyze DistributionY, Columns: After – Before
Normal Quantile Plot
Test MeanSpecify Hypothesized Mean: 0
65
JMP Output
Step 5 Conclusion: There is not evidence that the fitness program increases flexibility.
66
One-Way Analysis of Variance
67
One-Way ANOVA ANOVA is used to determine whether three
or more populations have different distributions.
A B C
Medical Treatment
68
ANOVA Strategy
The first step is to use the ANOVA F test to
determine if there are any significant differences
among the population means.
If the ANOVA F test shows that the population
means are not all the same, then follow up tests
can be performed to see which pairs of population
means differ.
69
One-Way ANOVA Model
i
ij
i
ij
ijiij
nj
ri
N
y
y
,,1
,,1
),0(~
groupith theofmean theis
levelfactor ith on the jth trial theof response theis
Where
2
In other words, for each group the observed value is the group mean plus some random variation.
70
One-Way ANOVA Hypothesis Step 1: We test whether there is a
difference in the population means.
equal. allnot are The :
: 210
ia
r
H
H
71
Step 2: Check ANOVA Assumptions The samples are random and independent of
each other. The populations are normally distributed. The populations all have the same standard
deviations.
The ANOVA F test is robust to the assumptions of normality and equal standard deviations.
72
Step 3: ANOVA F Test
Compare the variation within the samples to the variation between the samples.
A B C A B C
Medical Treatment
73
ANOVA Test Statistic
MSE
MSG
Groupswithin Variation
Groupsbetween Variation F
Variation within groups small compared with variation between groups → Large F
Variation within groups large compared with variation between groups → Small F
74
MSG
1-r
)(n)(n)(n
1 -r
SSGMSG
21r
222
211
yyyyyy
The mean square for groups, MSG, measures
the variability of the sample averages.
SSG stands for sums of squares groups.
75
MSE
1
)(
s
Wherer -n
1)s - (n1)s - (n 1)s - (n
r -n
SSE MSE
1i
2rr
222
211
i
n
jiij
n
yyi
Mean square error, MSE, measures the variability within the groups.
SSE stands for sums of squares error.
76
Steps 4-5 Step 4: Calculate the p-value.
Step 5: Write a conclusion.
77
ANOVA Example A researcher would like to determine if
three drugs provide the same relief from pain.
60 patients are randomly assigned to a treatment (20 people in each treatment).
Step 1: Formulate the HypothesesH0: μDrug A = μDrug B = μDrug C
Ha : The μi are not all equal.
http://office.microsoft.com/en-us/images
78
Steps 2-4 JMP demonstration
Analyze Fit Y By X Y, Response: Pain
X, Factor: Drug
Normal Quantile Plot Plot Actual by Quantile
Means/ANOVA
79
JMP Output and Conclusion
Step 5 Conclusion: There is strong evidence that the drugs are not all the same.
50
55
60
65
70
75
Pa
in
Drug A Drug B Drug CDrug
Drug ADrug BDrug C
-2.33 -1.64-1.28 -0.67 0.0 0.67 1.281.64 2.33
0.5
0.8
0.9
0.2
0.1
0.0
2
0.9
8
Normal Quantile
80
Follow-Up Test The p-value of the overall F test indicates
that the level of pain is not the same for patients taking drugs A, B and C.
We would like to know which pairs of treatments are different.
One method is to use Tukey’s HSD (honestly significant differences).
81
Tukey Tests Tukey’s test simultaneously tests
JMP demonstrationOneway Analysis of Pain By Drug Compare Means All Pairs, Tukey HSD
'a
'0
:H
:H
ii
ii
for all pairs of factor levels. Tukey’s HSD controls the overall type I error.
82
JMP Output
The JMP output shows that drugs A and C are significantly different.
Drug C
Drug C
Drug B
Level
Drug A
Drug B
Drug A
- Level
5.850000
3.600000
2.250000
Difference
1.677665
1.677665
1.677665
Std Err Dif
1.81283
-0.43717
-1.78717
Lower CL
9.887173
7.637173
6.287173
Upper CL
0.0027*
0.0897
0.3786
p-Value
83
Two-Way Analysis of Variance
84
Two-Way ANOVA We are interested in the effect of two
categorical factors on the response. We are interested in whether either of the
two factors have an effect on the response and whether there is an interaction effect. ◦ An interaction effect means that the effect on the
response of one factor depends on the level of the other factor.
85
Interaction
Low High Dosage
Impr
ovem
ent
No Interaction
Drug A Drug B
Low High Dosage
Impr
ovem
ent
Interaction
Drug A Drug B
86
Two-Way ANOVA Model
ij
ijk
ij
j
i
ijk
ijkijjiijk
nk
bj
ai
N
y
y
,...,1
,,1
,,1
),0(~
Bfactor of leveljth theandA factor of levelith theofeffect n interactio theis )(
Bfactor of leveljth theofeffect main theis
Afactor of levelith theofeffect main theis
mean overall theis
level Bfactor jth theand levelA factor ith on the kth trial theof response theis
Where
)(
2
87
Two-Way ANOVA Example We would like to determine the effect of two
alloys (low, high) and three cooling temperatures (low, medium, high) on the strength of a wire.
JMP demonstrationAnalyze Fit ModelY: StrengthHighlight Alloy and Temp and click Macros Factorial to DegreeRun Model
http://office.microsoft.com/en-us/images
88
JMP Output
Conclusion: There is strong evidence of an interaction between alloy and temperature.
89
Conclusion The one sample t-test allows us to test
whether the population mean of a group is equal to a specified value.
The two-sample t-test and paired t-test allow us to determine if the population means of two groups are different.
ANOVA allows us to determine whether the population means of several groups are different.
90
SAS, SPSS and R For information about using SAS, SPSS and
R to do ANOVA:
http://www.ats.ucla.edu/stat/sas/topics/anova.htm
http://www.ats.ucla.edu/stat/spss/topics/anova.htm
http://www.ats.ucla.edu/stat/r/sk/books_pra.htm
91
References Fisher’s Irises Data (used in one sample and
two sample t-test examples).
Flexibility data (paired t-test example):Michael Sullivan III. Statistics Informed Decisions Using Data. Upper Saddle River, New Jersey: Pearson Education, 2004: 602.
92
Special thanks to Jennifer Kensler for course materials and help with JMP!
93