View
218
Download
0
Category
Tags:
Preview:
Citation preview
The problem of sampling error in psychological research
• We previously noted that sampling error is problematic in psychological research because differences observed between experimental conditions could be due to real differences or sampling error.
Example of the problem
• Example: We want to know whether psychotherapy increases people’s psychological well-being. The average well-being of Chicagoans is 3.00 (SD = 1). We give a randomly sampled group of 25 Chicagoans therapy. Months later, we measure their well-being. The average well-being in this sample is 3.50.
Big Question
• Is the .50 difference between the therapy group and Chicagoans in general a result of therapy or an “accident” of sampling error?
• Note: There are two hypotheses implied in this question:– The sample comes from a population in which the mean is 3.00, and
the difference we observed is due to sampling error. (Often called the “null hypothesis.”)
– The sample does not come from a population in which the mean is 3.00. The difference is due to therapy. (Often called the “research hypothesis” or “alternative hypothesis.”)
• How can we determine which of these hypotheses is most likely to be true?
• The most popular tools for answering this kind of question are called Null-Hypothesis Significance Tests (NHSTs).
• Significance tests are a broad set of quantitative techniques for evaluating the probability of observing the data under the assumption that the null hypothesis is true. This information is used to make a binary (yes/no) decision about whether the null hypothesis is a viable explanation for the study results.
Basic Logic of NHST
• If we assume the null hypothesis is true (e.g., the difference between our sample mean and the population mean is due to sampling error), then we can generate a sampling distribution that characterizes the the distribution of sample means we might expect to observe.
• That is, if we make certain assumptions about the population (e.g., mu = 3) and the sampling process (e.g., random sampling, N = 25), we can determine (a) the expected sample mean and (b) the expected difference between an observed sample mean and the population mean when a sampling error is made.
• If the probability of observing our sample mean under these assumptions is “small,” we reject the null hypothesis.
• If the probability of observing our sample mean under these assumptions is “large,” we accept the null hypothesis.
Sampling distribution for the mean
Mean is 3.00
SD (SE) = 0.20
[1/sqrt(25) = .20]
WELL-BEING
1 2 3 4 5
WELL-BEING
1 2 3 4 5
Recall that we can find the proportion of sample means that fall between specific values.
We can interpret these as probabilities in the relative frequency sense of the term.
34%
14%
2 %
• *** We can use these probabilities to determine how likely it is that we will observe a range of sample means based on sampling error alone. Note: This is the same logic we used when we constructed confidence intervals in the last lecture. ***
• In our example, the probability of observing a sample mean between 2.8 and 3.2 is 68%.
• The probability of observing a sample mean equal to or greater than 3.5 is approximately 1%.
How NHSTs work
• Is 1% a “small” probability?• Because the distribution of sample means is continuous,
we must create an arbitrary point along this continuum for denoting what is “small” and what is “large.”
• By convention, if the probability of observing the sample mean is less than 5%, researchers reject the null hypothesis.
Rules of the NHST Game
• This probability value is often called a p-value or p.• When p < .05, a result is said to be “statistically
significant”• In short, when a result is statistically significant (p < .05),
we conclude that the difference we observed was unlikely to be due to sampling error alone. We “reject the null hypothesis.”
• If the statistic is not statistically significant (p > .05), we conclude that sampling error is a plausible interpretation of the results. We “fail to reject the null hypothesis.”
• It is important to keep in mind that NHSTs were developed for the purpose of making yes/no decisions about the null hypothesis.
• As a consequence, the null is either accepted or rejected on the basis of the p-value.
• For logical reasons, some people are uneasy “accepting the null hypothesis” when p > .05, and prefer to say that they “failed to reject the null hypothesis” instead. – This seems unnecessarily cumbersome to me, and divorces the
technique from its original decision-making purpose.
– In this class, please feel free to use whichever phrase seems most sensible to you.
Points of Interest
• The example we explored previously was an example of what is called a z-test of a sample mean.
• Significance tests have been developed for a number of statistics
– difference between two group means: t-test
– difference between two or more group means: ANOVA
– differences between proportions: chi-square
• We’ll discuss some common problems and misinterpretations of p-values and NHSTs in two weeks, but, for now, there are a few of things that you should bear in mind in the meantime:
– (1) The term “significant” does not mean important, substantial, or worthwhile.
• (2) The null and alternative hypotheses are often constructed to be mutually exclusive. If one is true, the other must be false.
• As a consequence, – When you reject the null hypothesis, you accept the alternative.– When you accept the null hypothesis, you reject the alternative.
• This may seem tricky because NHSTs do not test the research hypothesis per se. Formally, only the null hypothesis is tested.
• In addition, the logical problems discussed previously are relevant here.
• (3) Because NHSTs are often used to make a yes/no decision about whether the null hypothesis is a viable explanation, mistakes can be made.
Null is true Null is falseN
ull i
s tr
ueN
ull i
s fa
lse
Real WorldC
oncl
usio
n of
the
test
Correct decision
Correct decision
Type II error
Type I error
Inferential Errors and NHST
Errors in Inference using NHST
• Type I error: Your test is significant (p < .05), so you reject the null hypothesis, but the null hypothesis is actually true.
• Type II error: Your test is not significant (p > .05), you don’t reject the null hypothesis, but you should have because it is false.
• The probability of making a Type I error is determined by the experimenter. Often called the alpha value. Usually set to 5%.
• The probability of making a Type II error is determined by the experimenter. Often called the beta value. Usually ignored by researchers.
Errors in Inference using NHST
Errors in Inference using NHST
• The converse of Type II error is called Power: the probability of rejecting the null hypothesis when it is false—a correct decision. 1- beta
• Power is strongly influenced by sample size. With larger N, more likely to reject null if it is false.
• Note: N does not influence the likelihood of making a correct decision if the null hypothesis is true (i.e, not rejecting null). This probability is always equal to 1-alpha, regardless of sample size.
Recommended