View
224
Download
1
Category
Tags:
Preview:
Citation preview
MARE 250 Dr. Jason Turner
Hypothesis Testing III
To ASSUME is to make an…
Four assumptions for t-test hypothesis testing:
1. Random Samples2. Independent Samples3. Normal Populations (or large samples)4. Variances (std. dev.) are equal
When do I do the what now?
If all 4 assumptions are met:Conduct a pooled t-test - you can “pool” the samples because the variances are assumed to be equal
If the samples are not independent:Conduct a paired t-test
If the variances (std. dev.) are not equal:Conduct a non-pooled t-test
If the data is not normal or has small sample size:Conduct a non-parametric t-test (Mann-Whitney)
Significance Level
The probability of making a TYPE I Error (rejection of a true null hypothesis) is called the significance level (α) of a hypothesis test
TYPE II Error Probability (β) – nonrejection of a false null hypothesis
For a fixed sample size, the smaller we specify the significance level (α) , the larger will be the probability (β) , of not rejecting a false hypothesis
If H:0 is true
If H:0 is false
If H:0 is rejected
TYPE I ERROR
No Error
If H:0 is not rejected
No Error TYPE II ERROR
I have the POWER!!!
The power of a hypothesis test is the probability of not making a TYPE II error (rejecting a false null hypothesis) t evidence to support the alternative hypothesis
POWER = 1 - βProduce a power curve
We need more POWER!!!For a fixed significance level, increasing the sample size increases the power
Therefore, you can run a test to determine if your sample size HAS THE POWER!!!
By using a sufficiently large sample size, we can obtain a hypothesis test with as much power as we want
Power - the probability of being able to detect an effect of a given size
Sample size - the number of observations in each sample
Difference (effect) - the difference between μ for one population and μ for the other
Increasing the power of the test
There are four factors that can increase the power of a two-sample t-test:
1. Larger effect size (difference) - The greater the real difference between m for the two populations, the more likely it is that the sample means will also be different.
2. Higher α-level (the level of significance) - If you choose a higher value for α, you increase the probability of rejecting the null hypothesis, and thus the power of the test. (However, you also increase your chance of type I error.)
3. Less variability - When the standard deviation is smaller, smaller differences can be detected.
4. Larger sample sizes - The more observations there are in your samples, the more confident you can be that the sample means represent m for the two populations. Thus, the test will be more sensitive to smaller differences.
Increasing the power of the test
The most practical way to increase power is often to increase the sample size
However, you can also try to decrease the standard deviation by making improvements in your process or measurement
Sample size
Increasing the size of your samples increases the power of your test
You want enough observations in your samples to achieve adequate power, but not so many that you waste time and money on unnecessary sampling
If you provide the power that you want the test to have and the difference you want it to be able to detect, MINITAB will calculate how large your samples must be
MARE 250Dr. Jason Turner
Data Transformations
One advantage of using parametric statistics is that it makes it much easier to describe your data
If you have established that it follows a normal distribution you can be sure that a particular set of measurements can be properly described by its mean and standard deviation
If your data are not normally distributed you cannot use any of the tests that assume that it is (e.g. ANOVA, t test, regression analysis)
Data Transformations
If your data are not normally distributed it is often possible to normalize it by transforming it.
Transforming data to allow you to use parametric statistics is completely legitimate
People often feel uncomfortable when they transform data because it seems like it artificially improves their results but this is only because they feel comfortable with linear or arithmetic scales
However, there is no reason for not using other scales (e.g. logarithms, square roots, reciprocals or angles) where appropriate (See Chapter 13)
Data Transformations
Different transformations work for different data types: Logarithms : Growth rates are often exponential and log transforms will often normalize them. Log transforms are particularly appropriate if the variance increases with the mean. Reciprocal : If a log transform does not normalize your data you could try a reciprocal (1/x) transformation. This is often used for enzyme reaction rate data. Square root : This transform is often of value when the data are counts, e.g. # urchins, # Honu. Carrying out a square root transform will convert data with a Poisson distribution to a normal distribution. Arcsine : This transformation is also known as the angular transformation and is especially useful for percentages and proportions
Data Transformations
Which Transformation?
Johnson Transformation is useful when the collected data are non-normal, but you want to apply a methodology that requires a normal distribution
MINITAB selects one optimal distribution function from Johnson distribution system to transform the data to a normal distribution
Perc
ent
2001000
99
90
50
10
1
N 31AD 0.876P-Value 0.022
Perc
ent
420-2
99
90
50
10
1
N 31AD 0.176P-Value 0.915
Z Value
P-V
alu
e f
or
AD
test
1.21.00.80.60.40.2
0.8
0.6
0.4
0.2
0.0
0.96
Ref P
P-Value for Best Fit: 0.915249Z for Best Fit: 0.96Best Transformation Type: SLTransformation function equals-10.8734 + 2.58295 * Log( X + 5.34335 )
Probability Plot for Original Data
Probability Plot for Transformed Data
Select a Transformation
(P-Value = 0.005 means <= 0.005)
J ohnson Transformation for Males
Johnson Transformation
Determining the optimal transformation
Johnson Transformation uses the following algorithm to determine an optimal transformation from the three families of distribution: SB, SL, and SU,
where B, L, and U refer to the variable being bounded, lognormal, and unbounded
Bounded - something is of finite size, and that this is the case if it is smaller than some other object that has a finite size. (Otherwise it is unbounded.)
Johnson Transformation
How To?
STAT – Quality Tools – Johnson Transformation
Enter what variable to be transformed, and what the “new” transformed variable will be called
Places transformed data into a new column in your MINITAB datasheet – can copy this into Excel and save FOREVER…
Recommended