36
1. Compare and C ontrast a. The measures of Central Tendency b. The measures of Variability Central tendency is a statistical measure that identifies a single score as representative of an entire distribution of scores. The goal of central tendency is to find the single score that is most typical or most representative of the entire distribution. Unfortunately, there is no single, standard proced ure for determining central tendency. The problem is that there is no single measure that will always produce a central, representative value in every situation. There are three main measures of central tendency: the arithmetical mean, the median and the mode. The mean of a set of scores (abbreviated M) is the most common and useful measure of central tendency. The mean is the sum of the scores divided by the total number of scores. The mean is commonly known as the arithmetic average. The mean can only be used for variables at the interval or ratio levels of measurement. The mean of [2 6 2 10] is (2 + 6 + 2 + 10)/4 = 20/4 = 5. One can think of the mean as the balance point of a distribution (the center of g ravity ). It balances the distances of observations to the mean. Another measure of central tendency is the median, which is defined as the middle value when the numbers are arranged in increasing or decreasing order. The median is the score that divides the distribution of scores exactly in half. The median is also the 5 0th percentile. The median can be used for variables at the ordinal, interval or ratio levels of measurement. If for example, daily expenses are $50, $100, $150, $350, $350 the middle v alue is $150, and therefore, $150 is the median. For odd number of count the median is middle value. If there is an even number of items in a set, the median is the average of the two middle values. For example, if we had four values$50, $100, $ 150, $350the median would be the average of the two middle v alues, $100 and $150; thus, 125 is the median in that case. The median may s ometimes be a better indicator of central tendency than the mean, especially when there are extreme values. Another indicator of central tendency is the mode, or the value that occurs most often in a set of numbers. In other words, the mode is the score or category of scores in a frequency distribution that has the greatest frequency. In the set of expenses mentioned above, the mode would be $350 because it appears twice and the other values appear only once. The mode can be used f or variables at any level of measurement (nominal, ordinal, interval or ratio ). Sometimes a distribution has more than one mode. Such a distribution is called multimodal. A distribution with two modes is called bimodal. Note that the modes do not have to have the same frequencies. The tallest peak is called the major mode; other peaks are called minor modes. Some distributions do not have modes. A rectangular distribution has no mode. Some distributions have many peaks and valleys. Variability provides a quantitative measure of the degree to which scores in a distribution are spread out. The greater the difference between scores, the more spread out the distribution is. The more tightly the scores group together, the less variability there is in the distribution . Variability is the essence of statistics. The most frequently used methods of measurement of this variance are: range, deviation and variance, interquartile range and standard deviation. The range is simply the difference between the highest score and the lowest score in a distribution plus one. This statistic can be calculated for measurements that are on an interval scale or above. In dataset with 10 numbers {99, 45,23,67,45,91,82,78,62,51}, the highest number is 99 and the lowest number is 23, so 9923=76; the range is 76. The interquartile range (IQR) is a range that contains the middle 50% of the scores in a distribution. It is computed as follows: IQR=75th percentile25th percentile. A related measure of variability is called the semi-interquartile range. The semi-interquartile range is defined simply as the

phd assign

Embed Size (px)

Citation preview

Page 1: phd assign

8/7/2019 phd assign.

http://slidepdf.com/reader/full/phd-assign 1/35

1

1. Compare and Contrast

a. The measures of Central Tendency

b. The measures of Variability

Central tendency is a statistical measure that identifies a single score as representative of an

entire distribution of scores. The goal of central tendency is to find the single score that is most

typical or most representative of the entire distribution. Unfortunately, there is no single,

standard procedure for determining central tendency. The problem is that there is no single

measure that will always produce a central, representative value in every situation. There are

three main measures of central tendency: the arithmetical mean, the median and the mode.

The mean of a set of scores (abbreviated M) is the most common and useful measure of 

central tendency. The mean is the sum of the scores divided by the total number of scores. The

mean is commonly known as the arithmetic average. The mean can only be used for variables

at the interval or ratio levels of measurement. The mean of [2 6 2 10] is (2 + 6 + 2 + 10)/4 = 20/4 

= 5. One can think of the mean as the balance point of a distribution (the center of gravity). It

balances the distances of observations to the mean. Another measure of central tendency is

the median, which is defined as the middle value when the numbers are arranged in increasing

or decreasing order. The median is the score that divides the distribution of scores exactly in

half. The median is also the 50th percentile. The median can be used for variables at the

ordinal, interval or ratio levels of measurement. If for example, daily expenses are $50, $100,

$150, $350, $350 the middle value is $150, and therefore, $150 is the median. For odd number

of count the median is middle value. If there is an even number of items in a set, the median isthe average of the two middle values. For example, if we had four values$50, $100, $150,

$350the median would be the average of the two middle values, $100 and $150; thus, 125 is

the median in that case. The median may sometimes be a better indicator of central tendency

than the mean, especially when there are extreme values. Another indicator of central

tendency is the mode, or the value that occurs most often in a set of numbers. In other words,

the mode is the score or category of scores in a frequency distribution that has the greatest

frequency. In the set of expenses mentioned above, the mode would be $350 because it

appears twice and the other values appear only once. The mode can be used for variables at

any level of measurement (nominal, ordinal, interval or ratio). Sometimes a distribution has

more than one mode. Such a distribution is called multimodal. A distribution with two modes iscalled bimodal. Note that the modes do not have to have the same frequencies. The tallest

peak is called the major mode; other peaks are called minor modes. Some distributions do not

have modes. A rectangular distribution has no mode. Some distributions have many peaks and

valleys.

Variability provides a quantitative measure of the degree to which scores in a

distribution are spread out. The greater the difference between scores, the more spread out

the distribution is. The more tightly the scores group together, the less variability there is in the

distribution. Variability is the essence of statistics. The most frequently used methods of 

measurement of this variance are: range, deviation and variance, interquartile range and

standard deviation. The range is simply the difference between the highest score and thelowest score in a distribution plus one. This statistic can be calculated for measurements that

are on an interval scale or above. In dataset with 10 numbers {99,45,23,67,45,91,82,78,62,51},

the highest number is 99 and the lowest number is 23, so 9923=76; the range is 76. The

interquartile range (IQR) is a range that contains the middle 50% of the scores in a distribution.

It is computed as follows: IQR=75th percentile25th percentile. A related measure of variability

is called the semi-interquartile range. The semi-interquartile range is defined simply as the

Page 2: phd assign

8/7/2019 phd assign.

http://slidepdf.com/reader/full/phd-assign 2/35

2

interquartile range divided by 2. Variance can be defined as a measure of how close the scores

in the distribution are to the middle of the distribution. Using the mean as the measure of the

middle of the distribution, the variance is defined as the average squared difference of the

scores from the mean. When the scores are spread out or heterogeneous, the measure of 

variability should be large. When the scores are homogeneous the variability should be smaller.

Another measure of variability is the standard deviation. The standard deviation is simply thesquare root of the variance. The standard deviation is an especially useful measure of variability

when the distribution is normal or approximately normal (see Probability) because the

proportion of the distribution within a given number of standard deviations from the mean can

be calculated. Therefore standard deviation is the average distance from the mean. So the

mean is the representative value, and the standard deviation is the representative distance of 

any one point in the distribution from the mean.

While the measures of central tendency convey information about the commonalties of 

measured properties, the measures of variability quantify the degree to which they differ. If not

all values of data are the same, they differ and variability exists. The measures of central

tendency should be complemented by measures of variability for the same reason.

2. When are the 3 measure of Central Tendency used?

Mean

The mean is the most commonly-used measure of central tendency. When we talk about an

"average", we usually are referring to the mean. The mean is simply the sum of the values

divided by the total number of items in the set. The result is referred to as the arithmetic mean.

Sometimes it is useful to give more weighting to certain data points, in which case the result is

called the weighted arithmetic mean.

The notation used to express the mean depends on whether we are talking about the

population mean or the sample mean:

= population mean

= sample mean

The population mean then is defined as:

where

= number of data points in the population

= value of each data point i .

The mean is valid only for interval data or ratio data. Since it uses the values of all of the data

points in the population or sample, the mean is influenced by outliers that may be at the

extremes of the data set.

Page 3: phd assign

8/7/2019 phd assign.

http://slidepdf.com/reader/full/phd-assign 3/35

3

Median

The median is determined by sorting the data set from lowest to highest values and taking the

data point in the middle of the sequence. There is an equal number of points above and below

the median. For example, in the data set {1,2,3,4,5} the median is 3; there are two data points

greater than this value and two data points less than this value. In this case, the median is equalto the mean. But consider the data set {1,2,3,4,10}. In this dataset, the median still is three, but

the mean is equal to 4. If there is an even number of data points in the set, then there is no

single point at the middle and the median is calculated by taking the mean of the two middle

points.

The median can be determined for ordinal data as well as interval and ratio data. Unlike the

mean, the median is not influenced by outliers at the extremes of the data set. For this reason,

the median often is used when there are a few extreme values that could greatly influence the

mean and distort what might be considered typical. This often is the case with home prices and

with income data for a group of people, which often is very skewed. For such data, the medianoften is reported instead of the mean. For example, in a group of people, if the salary of one

person is 10 times the mean, the mean salary of the group will be higher because of the

unusually large salary. In this case, the median may better represent the typical salary level of 

the group.

Mode

The mode is the most frequently occurring value in the data set. For example, in the data set

{1,2,3,4,4}, the mode is equal to 4. A data set can have more than a single mode, in which case

it is multimodal. In the data set {1,1,2,3,3} there are two modes: 1 and 3.

The mode can be very useful for dealing with categorical data. For example, if a sandwich shop

sells 10 different types of sandwiches, the mode would represent the most popular sandwich.

The mode also can be used with ordinal, interval, and ratio data. However, in interval and ratio

scales, the data may be spread thinly with no data points having the same value. In such cases,

the mode may not exist or may not be very meaningful.

When to use Mean, Median, and Mode

The following table summarizes the appropriate methods of determining the middle or typical

value of a data set based on the measurement scale of the data.

Measurement Scale  Best Measure of the "Middle" 

Nominal

(Categorical) Mode

Ordinal Median

Page 4: phd assign

8/7/2019 phd assign.

http://slidepdf.com/reader/full/phd-assign 4/35

IntervalSymmetrical data: Mean

Skewed data: Median

Ratio Symmetrical data: MeanSkewed data: Median

3. What is Skewness. Explain each type and give examples.

Skewness

The first thing you usually notice about a distributions shape is whether it has one mode (peak) 

or more than one. If its unimodal (has just one peak), like most data sets, the next thing you

notice is whether its symmetric or skewed to one side. If the bulk of the data is at the left and

the right tail is longer, we say that the distribution is skewed right or positively skewed; if the

peak is toward the right and the left tail is longer, we say that the distribution is skewed left or

negatively skewed.

Look at the two graphs below. They both have = 0.6923 and = 0.1685, but their shapes are

different.

skewness = 0.5370 skewness = +0.5370

The first one is moderately skewed left: the left tail is longer and most of the distribution is at

the right. By contrast, the second distribution is moderately skewed right: its right tail is longer

and most of the distribution is at the left.

You can get a general impression of skewness by drawing a histogram, but there are also some

common numerical measures of skewness. Some authors favor one, some favor another.

You may remember that the mean and standard deviation have the same units as the original

data, and the variance has the square of those units. However, the skewness has no units: its a

pure number, like a z-score.

Page 5: phd assign

8/7/2019 phd assign.

http://slidepdf.com/reader/full/phd-assign 5/35

5

Computing

The moment coefficient of skewness of a data set is

skewness: g1 = m3 / m23/2 

(1) where

m3 = (xx)3 / n and m2 = (xx)2 / n

x is the mean and n is the sample size, as usual. m3 is called the third moment of the data set.

m2 is the variance, the square of the standard deviation.

Youll remember that you have to choose one of two different measures of standard deviation,

depending on whether you have data for the whole population or just a sample. The same is

true of skewness. If you have the whole population, then g1 above is the measure of skewness.

But if you have just a sample, you need the sample skewness:

(2) sample skewness:

source: D. N. Joanes and C. A. Gill. Comparing Measures of Sample Skewness and Kurtosis.

The Statistician 47(1):183189.

Excel doesnt concern itself with whether you have a sample or a population: its measure of 

skewness is always G1.

Example 1: College Mens Heights

Here are grouped data for heights of 100 randomly selected

male students, adapted from Spiegel & Stephens, Theory and 

Problems of Statistics 3/e (McGraw-Hill, 1999), page 68.

A histogram shows that the

data are skewed left, not

symmetric.

But how highly skewed are

they, compared to other data

sets? To answer this question,

you have to compute the skewness.

Begin with the sample size and sample mean. (The sample size was given, but it never hurts to

check.) 

n = 5+18+42+27+8 = 100

x = (61×5 + 64×18 + 67×42 + 70×27 + 73×8) ÷ 100

x = 9305 + 1152 + 2814 + 1890 + 584) ÷ 100

Height

(inches)

Class

Mark, x

Frequ-

ency, f 

59.562.5 61 5

62.565.5 64 18

65.568.5 67 42

68.571.5 70 27

71.574.5 73 8

Page 6: phd assign

8/7/2019 phd assign.

http://slidepdf.com/reader/full/phd-assign 6/35

6

x = 6745÷100 = 67.45

Now, with the mean in hand, you can compute the skewness. (Of course in real life youd

probably use Excel or a statistics package, but its good to know where the numbers come

from.) 

Class Mark,

xFrequency, f xf (xx) (xx)²f (xx)³f 

61 5 305 -6.45 208.01 -1341.68

64 18 1152 -3.45 214.25 -739.15

67 42 2814 -0.45 8.51 -3.83

70 27 1890 2.55 175.57 447.70

73 8 584 5.55 246.42 1367.63

6745 n/a 852.75 269.33

x, m2, m3  67.45 n/a 8.5275 2.6933

Finally, the skewness is

g1 = m3 / m23/2

= 2.6933 / 8.52753/2

= 0.1082

But wait, theres more! That would be the skewness if the you had data for the whole

population. But obviously there are more than 100 male students in the world, or even inalmost any school, so what you have here is a sample, not the population. You must compute

the sample skewness:

= [(100×99) / 98] [2.6933 / 8.52753/2

] = 0.1098

Interpreting

If skewness is positive, the data are positively skewed or skewed right, meaning that the right

tail of the distribution is longer than the left. If skewness is negative, the data are negatively

skewed or skewed left, meaning that the left tail is longer.

If skewness = 0, the data are perfectly symmetrical. But a skewness of exactly zero is quite

unlikely for real-world data, so how can you interpret the skewness number? Bulmer, M. G.,

Principles of Statistics (Dover, 1979) a classic suggests this rule of thumb:

Page 7: phd assign

8/7/2019 phd assign.

http://slidepdf.com/reader/full/phd-assign 7/35

7

y  If skewness is less than 1 or greater than +1, the distribution is highly skewed.

y  If skewness is between 1 and ½ or between +½ and +1, the distribution is moderately

skewed.

y  If skewness is between ½ and +½, the distribution is approximately symmetric.

With a skewness of 0.1098, the sample data for student heights are approximately symmetric.

Caution: This is an interpretation of the data you actually have. When you have data for the

whole population, thats fine. But when you have a sample, the sample skewness doesnt

necessarily apply to the whole population. In that case the question is, from the sample

skewness, can you conclude anything about the population skewness? To answer that question,

see the next section.

Inferring

Your data set is just one sample drawn from a population. Maybe, from ordinary sample

variability, your sample is skewed even though the population is symmetric. But if the sample is

skewed too much for random chance to be the explanation, then you can conclude that there is

skewness in the population.

But what do I mean by too much for random chance to be the explanation? To answer that,

you need to divide the sample skewness G1 by the standard error of skewness (SES) to get the

test statistic, which measures how many standard errors separate the sample skewness from

zero:

(3) test statistic: Zg1 = G1/SES where

This formula is adapted from page 85 of Cramer, Duncan, Basic Statistics for Social Research 

(Routledge, 1997). (Some authors suggest (6/n), but for small samples thats a poor

approximation. And anyway, weve all got calculators, so you may as well do it right.) 

The critical value of Zg1 is approximately 2. (This is a two-tailed test of skewness 0 at roughly

the 0.05 significance level.) 

y  If Zg1 < 2, the population is very likely skewed negatively (though you dont know by

how much).y  If Zg1 is between 2 and +2, you cant reach any conclusion about the skewness of the

population: it might be symmetric, or it might be skewed in either direction.

y  If Zg1 > 2, the population is very likely skewed positively (though you dont know by how

much).

Dont mix up the meanings of this test statistic and the amount of skewness. The amount of 

skewness tells you how highly skewed your sample is: the bigger the number, the bigger the

skew. The test statistic tells you whether the whole population is probably skewed, but not by

how much: the bigger the number, the higher the probability.

Estimating

GraphPad suggests a confidence interval for skewness:

(4) 95% confidence interval of population skewness = G1 ± 2 SES

Page 8: phd assign

8/7/2019 phd assign.

http://slidepdf.com/reader/full/phd-assign 8/35

8

For the college mens heights, recall that the sample skewness was G1 = 0.1098. The sample

size was n = 100 and therefore the standard error of skewness is

SES = [ (600×99) / (98×101×103) ] = 0.2414 

The test statistic is

Zg1 = G1/SES = 0.1098 / 0.2414 = 0.45

This is quite small, so its impossible to say whether the population is symmetric or skewed. 

Since the sample skewness is small, a confidence interval is probably reasonable:

G1 ± 2 SES = .1098 ± 2×.2414 = .1098±.4828 = 0.5926 to +0.3730.

You can give a 95% confidence interval of skewness as about 0.59 to +0.37, more or less.

4. Define Kurtosis. Differentiate each from the others and give examples.

Kurtosis

If a distribution is symmetric, the next question is about the central peak: is it high and sharp, or

short and broad? You can get some idea of this from the histogram, but a numerical measure is

more precise.

The height and sharpness of the peak relative to the rest of the data are measured by a

number called kurtosis. Higher values indicate a higher, sharper peak; lower values indicate a

lower, less distinct peak. This occurs because, as Wikipedias article on kurtosis explains, higherkurtosis means more of the variability is due to a few extreme differences from the mean,

rather than a lot of modest differences from the mean.

Balanda and MacGillivray say the same thing in another way: increasing kurtosis is associated

with the movement of probability mass from the shoulders of a distribution into its center

and tails. (Kevin P. Balanda and H.L. MacGillivray. Kurtosis: A Critical Review. The American

Statistician 42:2 [May 1988], pp 111119, drawn to my attention by Karl Ove Hufthammer) 

You may remember that the mean and standard deviation have the same units as the original

data, and the variance has the square of those units. However, the kurtosis has no units: its a

pure number, like a z-score.

The reference standard is a normal distribution, which has a kurtosis of 3. In token of this, often

the excess kurtosis is presented: excess kurtosis is simply kurtosis3. For example, the

kurtosis reported by Excel is actually the excess kurtosis.

y  A normal distribution has kurtosis exactly 3 (excess kurtosis exactly 0). Any distribution

with kurtosis 3 (excess 0) is called mesokurtic.

y  A distribution with kurtosis <3 (excess kurtosis <0) is called platykurtic. Compared to a

normal distribution, its central peak is lower and broader, and its tails are shorter and

thinner.y  A distribution with kurtosis >3 (excess kurtosis >0) is called leptokurtic. Compared to a

normal distribution, its central peak is higher and sharper, and its tails are longer and

fatter.

Page 9: phd assign

8/7/2019 phd assign.

http://slidepdf.com/reader/full/phd-assign 9/35

9

Visualizing

Kurtosis is unfortunately harder to picture than skewness, but these illustrations, suggested by

Wikipedia, should help. All three of these distributions have mean of 0, standard deviation of 1,

and skewness of 0, and all are plotted on the same horizontal and vertical scale. Look at the

progression from left to right, as kurtosis increases.

kurtosis = 1.8, excess = 1.2

Uniform(min=3, max=3) 

kurtosis = 3, excess = 0

Normal(=0, =1) 

kurtosis = 4.2, excess = 1.2

Logistic(=0, =0.55153) 

Moving from the illustrated uniform distribution to a normal distribution, you see that the

shoulders have transferred some of their mass to the center and the tails. In other words, the

intermediate values have become less likely and the central and extreme values have become

more likely. The kurtosis increases while the standard deviation stays the same, because more

of the variation is due to extreme values.

Moving from the normal distribution to the illustrated logistic distribution, the trend continues.

There is even less in the shoulders and even more in the tails, and the central peak is higher and

narrower.

How far can this go? What are the smallest and largest possible values of kurtosis? The

smallest possible kurtosis is 1 (excess kurtosis 2), and the largest is , as shown here:

kurtosis = 1, excess = 2 kurtosis = , excess =

A discrete distribution with two equally likely outcomes, such as winning or losing on the flip of 

a coin, has the lowest possible kurtosis. It has no central peak and no real tails, and you could

say that its all shoulder its as platykurtic as a distribution can be. At the other extreme,

Students t distribution with four degrees of freedom has infinite kurtosis. A distribution cant

be any more leptokurtic than this.

Page 10: phd assign

8/7/2019 phd assign.

http://slidepdf.com/reader/full/phd-assign 10/35

10

Computing

The moment coefficient of kurtosis of a data set is computed almost the same way as the

coefficient of skewness: just change the exponent 3 to 4 in the formulas:

kurtosis: a4 = m4 / m22

and excess kurtosis: g2 = a43

(5) where

m4 = (xx)4 / n and m2 = (xx)

2 / n

Again, the excess kurtosis is generally used because the excess kurtosis of a normal distribution

is 0. x is the mean and n is the sample size, as usual. m4 is called the fourth moment of the data

set. m2 is the variance, the square of the standard deviation.

Just as with variance, standard deviation, and kurtosis, the above is the final computation if you

have data for the whole population. But if you have data for only a sample, you have to

compute the sample excess kurtosis using this formula, which comes from Joanes and Gill:

(6) sample excess kurtosis:

Excel doesnt concern itself with whether you have a sample or a population: its measure of 

kurtosis is always G2.

Example: Lets continue with the example of the college mens heights, and compute the

kurtosis of the data set. n = 100, x = 67.45 inches, and the variance m2 = 8.5275 in² were

computed earlier.

Class Mark, x Frequency, f xx (xx)4f 

61 5 -6.45 8653.84 

64 18 -3.45 2550.05

67 42 -0.45 1.72

70 27 2.55 1141.63

73 8 5.55 7590.35

n/a 19937.60

m4  n/a 199.3760

Page 11: phd assign

8/7/2019 phd assign.

http://slidepdf.com/reader/full/phd-assign 11/35

11

Finally, the kurtosis is

a4 = m4 / m2² = 199.3760/8.5275² = 2.7418

and the excess kurtosis is

g2 = 2.74183 = 0.2582

But this is a sample, not the population, so you have to compute the sample excess kurtosis:

G2 = [99/(98×97)] [101×(0.2582)+6)] = 0.2091

This sample is slightly platykurtic: its peak is just a bit shallower than the peak of a normal

distribution.

Inferring

Your data set is just one sample drawn from a population. How far must the excess kurtosis be

from 0, before you can say that the population also has nonzero excess kurtosis?

The answer comes in a similar way to the similar question about skewness. You divide the

sample excess kurtosis by the standard error of kurtosis (SEK) to get the test statistic, which

tells you how many standard errors the sample excess kurtosis is from zero:

(7) test statistic: Zg2 = G2 / SEK where

The formula is adapted from page 89 of Duncan Cramers Basic Statistics for Social Research 

(Routledge, 1997). (Some authors suggest (24/n), but for small samples thats a poor

approximation. And anyway, weve all got calculators, so you may as well do it right.) 

The critical value of Zg2 is approximately 2. (This is a two-tailed test of excess kurtosis 0 at

approximately the 0.05 significance level.) 

y  If Zg2 < 2, the population very likely has negative excess kurtosis (kurtosis <3,

platykurtic), though you dont know how much.

y  If Zg2 is between 2 and +2, you cant reach any conclusion about the kurtosis: excesskurtosis might be positive, negative, or zero.

y  If Zg2 > +2, the population very likely has positive excess kurtosis (kurtosis >3,

leptokurtic), though you dont know how much.

For the sample college mens heights (n=100), you found excess kurtosis of G2 = 0.2091. The

sample is platykurtic, but is this enough to let you say that the whole population is platykurtic

(has lower kurtosis than the bell curve)?

First compute the standard error of kurtosis:

SEK = 2 × SES × [ (n²1) / ((n3)(n+5)) ]

n = 100, and the SES was previously computed as 0.2414.

SEK = 2 × 0.2414 × [ (100²1) / (97×105) ] = 0.4784 

The test statistic is

Page 12: phd assign

8/7/2019 phd assign.

http://slidepdf.com/reader/full/phd-assign 12/35

12

Zg2 = G2/SEK = 0.2091 / 0.4784 = 0.44 

You cant say whether the kurtosis of the population is the same as or different from the

kurtosis of a normal distribution.

Assessing Normality

There are many ways to assess normality, and unfortunately none of them are without

problems. Graphical methods are a good start, such as plotting a histogram and making a

quantile plot.

One test is the D'Agostino-Pearson omnibus test, so called because it uses the test statistics for

both skewness and kurtosis to come up with a single p-value. The test statistic is

(8) DP = Zg1² + Zg2² follows ² with df=2

You can look up the p-value in a table, or use ²cdf on a TI-83 or TI-84.

Caution: The DAgostino-Pearson test has a tendency to err on the side of rejecting normality,

particularly with small sample sizes. David Moriarty, in his StatCat utility, recommends that you

dont use DAgostino-Pearson for sample sizes below 20. 

For college students heights you had test statistics Z

g1 = 0.45 for skewness and Z

g2 = 0.44 for

kurtosis. The omnibus test statistic is

DP = Zg1² + Zg2² = 0.45² + 0.44² = 0.3961

and the p-value for ²(2 df ) > 0.3961, from a table or a statistics calculator, is 0.8203. You

cannot reject the assumption of normality. (Remember, you never accept the null hypothesis,

so you cant say from this test that the distribution is normal.) The histogram suggests

normality, and this test gives you no reason to reject that impression.

Example 2: Size of Rat Litters

For a second illustration of inferences about skewness and kurtosis of a population, Ill use an

example from Bulmers Principles of Statistics:

Frequency distribution of litter size in rats, n=815

Litter size 1 2 3 4 5 6 7 8 9 10 11 12

Frequency 7 33 58 116 125 126 121 107 56 37 25 4 

Ill spare you the detailed calculations, but you should be able to verify them by following

equation (1) and equation (2):

n = 815, x = 6.1252, m2 = 5.1721, m3 = 2.0316

skewness g1 = 0.1727 and sample skewness G1 = 0.1730

Page 13: phd assign

8/7/2019 phd assign.

http://slidepdf.com/reader/full/phd-assign 13/35

13

The sample is roughly symmetric but slightly skewed right, which looks about right from the

histogram. The standard error of skewness is

SES = [ (6×815×814) / (813×816×818) ] = 0.0856

Dividing the skewness by the SES, you get the test statistic

Zg1 = 0.1730 / 0.0856 = 2.02

Since this is greater than 2, you can say that there is some positive skewness in the population. 

Again, some positive skewness just means a figure greater than zero; it doesnt tell us

anything more about the magnitude of the skewness.

If you go on to compute a 95% confidence interval of skewness from equation (4), you get

0.1730±2×0.0856 = 0.00 to 0.34.

What about the kurtosis? You should be able to follow equation (5) and compute a fourth

moment of m4 = 67.3948. You already have m2 = 5.1721, and therefore

kurtosis a4 = m4 / m2² = 67.3948 / 5.1721² = 2.5194 

excess kurtosis g2 = 2.51943 = 0.4806

sample excess kurtosis G2 = [814/(813×812)] [816×(0.4806+6) = 0.4762

So the sample is moderately less peaked than a normal distribution. Again, this matches the

histogram, where you can see the higher shoulders.

What if anything can you say about the population? For this you need equation (7). Begin by

computing the standard error of kurtosis, using n = 815 and the previously computed SES of 0.0.0856:

SEK = 2 × SES × [ (n²1) / ((n3)(n+5)) ]

SEK = 2 × 0.0856 × [ (815²1) / (812×820) ] = 0.1711

and divide:

Zg2 = G2/SEK = 0.4762 / 0.1711 = 2.78

Since Zg2 is comfortably below 2, you can say that the distribution of all litter sizes is

platykurtic, less sharply peaked than the normal distribution. But be careful: you know that it is

platykurtic, but you dont know by how much.

You already know the population is not normal, but lets apply the DAgostino-Pearson test

anyway:

Page 14: phd assign

8/7/2019 phd assign.

http://slidepdf.com/reader/full/phd-assign 14/35

14 

DP = 2.02² + 2.78² = 11.8088

p-value = P( ²(2) > 11.8088 ) = 0.0027

The test agrees with the separate tests of skewness and kurtosis: sizes of rat litters, for the

entire population of rats, is not normally distributed.

How do I determine whether my data are normal?

How do I determine whether my data are normal? 

y  There are three interrelated approaches to determine normality, and all three should be

conducted.

1.  Look at a histogram with the normal curve superimposed. A histogram provides

useful graphical representation of the data. - To provide a rough example of 

normality and non-normality, see the following histograms. The black linesuperimposed on the histograms represents the bell-shaped "normal" curve.

Notice how the data for variable1 are normal, and the data for variable2 are

non-normal. In this case, the non-normality is driven by the presence of an

outlier. For more information about outliers, see What are outliers?, How do I

detect outliers?, and How do I deal with outliers?. Problem -- All samples deviate

somewhat from normal, so the question is how much deviation from the black

line indicates non-normality? Unfortunately, graphical representations like

histogram provide no hard-and-fast rules. After you have viewed many (many!) 

histograms, over time you will get a sense for the normality of data.

2. Look at the values of Skewness and Kurtosis. Skewness involves the symmetry

of the distribution. Skewness that is normal involves a perfectly symmetric

distribution. A positively skewed distribution has scores clustered to the left,

with the tail extending to the right. A negatively skewed distribution has scores

clustered to the right, with the tail extending to the left. Kurtosis involves thepeakedness of the distribution. Kurtosis that is normal involves a distribution

that is bell-shaped and not too peaked or flat. Positive kurtosis is indicated by a

peak. Negative kurtosis is indicated by a flat distribution. Both Skewness and

Kurtosis are 0 in a normal distribution, so the farther away from 0, the more

non-normal the distribution. The question is how much skew or kurtosis

render the data non-normal? This is an arbitrary determination, and sometimes

Page 15: phd assign

8/7/2019 phd assign.

http://slidepdf.com/reader/full/phd-assign 15/35

15

difficult to interpret using the values of Skewness and Kurtosis. - The histogram

above for variable1 represents perfect symmetry (skewness) and perfect

peakedness (kurtosis); and the descriptive statistics below for variable1 parallel

this information by reporting "0" for both skewness and kurtosis. The histogram

above for variable2 represents positive skewness (tail extending to the right) and

positive kurtosis (high peak); and the descriptive statistics below for variable2parallel this information. Problem -- The question is how much skew or

kurtosis render the data non-normal? This is an arbitrary determination, and

sometimes difficult to interpret using the values of Skewness and Kurtosis.

Luckily, there are more objective tests of normality, described next.

3. Look at established tests for normality that take into account both Skewness

and Kurtosis simultaneously. The Kolmogorov-Smirnov test (K-S) and Shapiro-

Wilk (S-W) test are designed to test normality by comparing your data to anormal distribution with the same mean and standard deviation of your sample.

If the test is NOT significant, then the data are normal, so any value above .05

indicates normality. If the test is significant (less than .05), then the data are non-

normal. - See the data below which indicate variable1 is normal, and

variable2 is non-normal. Also, keep in mind one limitation of the normality tests

is that the larger the sample size, the more likely to get significant results. Thus,

you may get significant results with only slight deviations from normality when

sample sizes are large.

4. Look at normality plots of the data. Normal Q-Q Plot provides a graphical

way to determine the level of normality. The black line indicates the values your

sample should adhere to if the distribution was normal. The dots are your actual

data. If the dots fall exactly on the black line, then your data are normal. If they

deviate from the black line, your data are non-normal. - Notice how

the data for variable1 fall along the line, whereas the data for variable2 deviate

from the line.

Page 16: phd assign

8/7/2019 phd assign.

http://slidepdf.com/reader/full/phd-assign 16/35

16

5. What is correlation? Enumerate measurements of correlation and explain their uses. What

is the indication of the magnitude of a relationship? Give examples.

Correlation coefficients measure the strength of association between two variables. The most

common correlation coefficient, called the Pearson product-moment correlation coefficient,

measures the strength of the linear association between variables.

In this tutorial, when we speak simply of a correlation coefficient, we are referring to the

Pearson product-moment correlation. Generally, the correlation coefficient of a sample is

denoted by r , and the correlation coefficient of a population is denoted by or R.

How to Interpret a Correlation Coefficient

The sign and the absolute value of a correlation coefficient describe the direction and the

magnitude of the relationship between two variables.

y  The value of a correlation coefficient ranges between -1 and 1.

y  The greater the absolute value of a correlation coefficient, the stronger the linear  

relationship.

y  The strongest linear relationship is indicated by a correlation coefficient of -1 or 1.

y  The weakest linear relationship is indicated by a correlation coefficient equal to 0.

y  A positive correlation means that if one variable gets bigger, the other variable tends toget bigger.

y  A negative correlation means that if one variable gets bigger, the other variable tends to

get smaller.

Keep in mind that the Pearson product-moment correlation coefficient only measures linear

relationships. Therefore, a correlation of 0 does not mean zero relationship between two

variables; rather, it means zero linear relationship. (It is possible for two variables to have zero

linear relationship and a strong curvilinear relationship at the same time.) 

Scatterplots and Correlation Coefficients

The scatterplots below show how different patterns of data produce different degrees of 

correlation.

Page 17: phd assign

8/7/2019 phd assign.

http://slidepdf.com/reader/full/phd-assign 17/35

17

Maximum positive correlation

(r = 1.0)

Strong positive correlation

(r = 0.80)

Zero correlation

(r = 0)

Minimum negative correlation(r = -1.0)

Moderate negative correlation(r = -0.43)

Strong correlation & outlier(r = 0.71)

Several points are evident from the scatterplots.

y  When the slope of the line in the plot is negative, the correlation is negative; and vice

versa.

y  The strongest correlations (r = 1.0 and r = -1.0 ) occur when data points fall exactly on a

straight line.

y  The correlation becomes weaker as the data points become more scattered.

y  If the data points fall in a random pattern, the correlation is equal to zero.

y Correlation is affected by outliers. Compare the first scatterplot with the last scatterplot.The single outlier in the last plot greatly reduces the correlation (from 1.00 to 0.71).

How to Calculate a Correlation Coefficient

If you look in different statistics textbooks, you are likely to find different-looking (but

equivalent) formulas for computing a correlation coefficient. In this section, we present several

formulas that you may encounter.

The most common formula for computing a product-moment correlation coefficient (r) is given

below.

Product-moment correlation coefficient. 

The formula below uses population means and population standard deviations to compute a

population correlation coefficient () from population data.

Product-moment correlation coefficient. The correlation r between two

variables is:

r = (xy) / sqrt [ ( x2 ) * ( y2 ) ]

where is the summation symbol, x = xi - x, xi is the x value for 

observation i, x is the mean x value, y = yi - y, yi is the y value for 

observation i, and y is the mean y value. 

Page 18: phd assign

8/7/2019 phd assign.

http://slidepdf.com/reader/full/phd-assign 18/35

18

The formula below uses sample means and sample standard deviations to compute a

correlation coefficient (r) from sample data.

The interpretation of the sample correlation coefficient depends on how the sample data is

collected. With a simple random sample, the sample correlation coefficient is an unbiased

estimate of the population correlation coefficient.

Each of the latter two formulas can be derived from the first formula. Use the second formula

when you have data from the entire population. Use the third formula when you only have

sample data. When in doubt, use the first formula. It is always correct.

Fortunately, you will rarely have to compute a correlation coefficient by hand. Many software

packages (e.g., Excel) and most graphing calculators have a correlation function that will do the

 job for you.

Note: Sometimes, it is not clear whether a software package or a graphing calculator is

computing a population correlation coefficient or a sample correlation coefficient. For example,

a casual user might not realize that Microsoft uses a population correlation coefficient () for

the Pearson function in its Excel software.

6. What are the different measures in testing hypothesis? Explain each.

Hypothesis Tests

A statistical hypothesis is an assumption about a population parameter. This assumption may

or may not be true.

Population correlation coefficient. The correlation between two

variables is:

= [ 1 / N ] * { [ (Xi - X) / x ] * [ (Yi - Y) / y ] }

where N is the number of observations in the population, is the

summation symbol, Xi is the X value for observation i, X is the

 population mean for variable X, Yi is the Y value for observation i, Y is

the population mean for variable Y, x is the population standard

deviation of X, and y is the population standard deviation of Y. 

Sample correlation coefficient. The correlation r between two

variables is:

r = [ 1 / (n - 1) ] * { [ (xi - x) / sx ] * [ (yi - y) / sy ] }

where n is the number of observations in the sample, is the summation

symbol, xi is the x value for observation i, x is the sample mean of x, yi 

is the y value for observation i, y is the sample mean of y, sx is the

sample standard deviation of x, and sy is the sample standard deviation

of y. 

Page 19: phd assign

8/7/2019 phd assign.

http://slidepdf.com/reader/full/phd-assign 19/35

19

The best way to determine whether a statistical hypothesis is true would be to examine the

entire population. Since that is often impractical, researchers typically examine a random

sample from the population. If sample data are not consistent with the statistical hypothesis,

the hypothesis is rejected.

There are two types of statistical hypotheses.y  Null hypothesis. The null hypothesis, denoted by H0, is usually the hypothesis that

sample observations result purely from chance.

y  Alternative hypothesis. The alternative hypothesis, denoted by H1 or Ha, is the

hypothesis that sample observations are influenced by some non-random cause.

For example, suppose we wanted to determine whether a coin was fair and balanced. A null

hypothesis might be that half the flips would result in Heads and half, in Tails. The alternative

hypothesis might be that the number of Heads and Tails would be very different. Symbolically,

these hypotheses would be expressed as

H0: P = 0.5

Ha: P 0.5

Suppose we flipped the coin 50 times, resulting in 40 Heads and 10 Tails. Given this result, we

would be inclined to reject the null hypothesis. We would conclude, based on the evidence,

that the coin was probably not fair and balanced.

Can We Accept the Null Hypothesis?

Some researchers say that a hypothesis test can have one of two outcomes: you accept the null

hypothesis or you reject the null hypothesis. Many statisticians, however, take issue with the

notion of "accepting the null hypothesis." Instead, they say: you reject the null hypothesis or

you fail to reject the null hypothesis.

Why the distinction between "acceptance" and "failure to reject?" Acceptance implies that thenull hypothesis is true. Failure to reject implies that the data are not sufficiently persuasive for

us to prefer the alternative hypothesis over the null hypothesis.

Hypothesis Tests

Statisticians follow a formal process to determine whether to reject a null hypothesis, based on

sample data. This process, called hypothesis testing, consists of four steps.

y  State the hypotheses. This involves stating the null and alternative hypotheses. The

hypotheses are stated in such a way that they are mutually exclusive. That is, if one is

true, the other must be false.y  Formulate an analysis plan. The analysis plan describes how to use sample data to

evaluate the null hypothesis. The evaluation often focuses around a single test statistic.

y  Analyze sample data. Find the value of the test statistic (mean score, proportion, t-

score, z-score, etc.) described in the analysis plan.

y  Interpret results. Apply the decision rule described in the analysis plan. If the value of 

the test statistic is unlikely, based on the null hypothesis, reject the null hypothesis.

Decision Errors

Two types of errors can result from a hypothesis test.

y  Type I error. A Type I error occurs when the researcher rejects a null hypothesis when it

is true. The probability of committing a Type I error is called the significance level. This

probability is also called alpha, and is often denoted by .

y  Type II error. A Type II error occurs when the researcher fails to reject a null hypothesis

that is false. The probability of committing a Type II error is called Beta, and is often

Page 20: phd assign

8/7/2019 phd assign.

http://slidepdf.com/reader/full/phd-assign 20/35

Page 21: phd assign

8/7/2019 phd assign.

http://slidepdf.com/reader/full/phd-assign 21/35

21

o  Significance level. Often, researchers choose significance levels equal to 0.01,

0.05, or 0.10; but any value between 0 and 1 can be used.

o  Test method. Typically, the test method involves a test statistic and a sampling

distribution. Computed from sample data, the test statistic might be a mean

score, proportion, difference between means, difference between proportions,

z-score, t-score, chi-square, etc. Given a test statistic and its samplingdistribution, a researcher can assess probabilities associated with the test

statistic. If the test statistic probability is less than the significance level, the null

hypothesis is rejected.

y  Analyze sample data. Using sample data, perform computations called for in the analysis

plan.

o  Test statistic. When the null hypothesis involves a mean or proportion, use

either of the following equations to compute the test statistic.

Test statistic = (Statistic - Parameter) / (Standard deviation of statistic) 

Test statistic = (Statistic - Parameter) / (Standard error of statistic) 

where Parameter is the value appearing in the null hypothesis, and Statistic is the point

estimate of Parameter . As part of the analysis, you may need to compute the standard

deviation or standard error of the statistic. Previously, we presented common formulas for the

standard deviation and standard error.

When the parameter in the null hypothesis involves categorical data, you may use a chi-square

statistic as the test statistic. Instructions for computing a chi-square test statistic are presented

in the lesson on the chi-square goodness of fit test.

o  P-value. The P-value is the probability of observing a sample statistic as extremeas the test statistic, assuming the null hypothesis is true.

y  Interpret the results. If the sample findings are unlikely, given the null hypothesis, the

researcher rejects the null hypothesis. Typically, this involves comparing the P-value to

the significance level, and rejecting the null hypothesis when the P-value is less than the

significance level.

Hypothesis Test of the Mean

This lesson explains how to conduct a hypothesis test of a mean, when the following conditions

are met:y  The sampling method is simple random sampling.

y  The sample is drawn from a normal or near-normal population.

Generally, the sampling distribution will be approximately normally distributed if any of the

following conditions apply.

y  The population distribution is normal.

y  The sampling distribution is symmetric, unimodal, without outliers, and the sample size

is 15 or less.

y  The sampling distribution is moderately skewed, unimodal, without outliers, and the

sample size is between 16 and 40.

y  The sample size is greater than 40, without outliers.

This approach consists of four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3) 

analyze sample data, and (4) interpret results.

State the Hypotheses

Page 22: phd assign

8/7/2019 phd assign.

http://slidepdf.com/reader/full/phd-assign 22/35

22

Every hypothesis test requires the analyst to state a null hypothesis and an alternative

hypothesis. The hypotheses are stated in such a way that they are mutually exclusive. That is, if 

one is true, the other must be false; and vice versa.

The table below shows three sets of hypotheses. Each makes a statement about how the

population mean is related to a specified value M. (In the table, the symbol means " not

equal to ".) 

Set Null hypothesis Alternative hypothesis Number of tails

1 = M M 2

2 > M < M 1

3 < M > M 1

The first set of hypotheses (Set 1) is an example of a two-tailed test, since an extreme value on

either side of the sampling distribution would cause a researcher to reject the null hypothesis.

The other two sets of hypotheses (Sets 2 and 3) are one-tailed tests, since an extreme value on

only one side of the sampling distribution would cause a researcher to reject the null

hypothesis.

Formulate an Analysis Plan

The analysis plan describes how to use sample data to accept or reject the null hypothesis. It

should specify the following elements.

y  Significance level. Often, researchers choose significance levels equal to 0.01, 0.05, or

0.10; but any value between 0 and 1 can be used.

y  Test method. Use the one-sample t-test to determine whether the hypothesized mean

differs significantly from the observed sample mean.

Analyze Sample Data

Using sample data, conduct a one-sample t-test. This involves finding the standard error,

degrees of freedom, test statistic, and the P-value associated with the test statistic.

y  Standard error. Compute the standard error (SE) of the sampling distribution.

SE = s * sqrt{ ( 1/n ) * ( 1 - n/N ) * [ N / ( N - 1 ) ] }

where s is the standard deviation of the sample, N is the population size, and n is the sample

size. When the population size is much larger (at least 10 times larger) than the sample size, the

standard error can be approximated by:

SE = s / sqrt( n ) y  Degrees of freedom. The degrees of freedom (DF) is equal to the sample size (n) minus

one. Thus, DF = n - 1.

y  Test statistic. The test statistic is a t-score (t) defined by the following equation.

t = (x - ) / SE

where x is the sample mean, is the hypothesized population mean in the null hypothesis, and

SE is the standard error.

y  P-value. The P-value is the probability of observing a sample statistic as extreme as the

test statistic. Since the test statistic is a t-score, use the t Distribution Calculator to

assess the probability associated with the t-score, given the degrees of freedom

computed above. (See sample problems at the end of this lesson for examples of how

this is done.) 

Interpret Results

If the sample findings are unlikely, given the null hypothesis, the researcher rejects the null

hypothesis. Typically, this involves comparing the P-value to the significance level, and rejecting

the null hypothesis when the P-value is less than the significance level.

Page 23: phd assign

8/7/2019 phd assign.

http://slidepdf.com/reader/full/phd-assign 23/35

23

Hypothesis Test for the Difference Between Two Means

This lesson explains how to conduct a hypothesis test for the difference between two means.

The test procedure, called the two-sample t-test, is appropriate when the following conditions

are met:y  The sampling method for each sample is simple random sampling.

y  The samples are independent.

y  Each population is at least 10 times larger than its respective sample.

y  Each sample is drawn from a normal or near-normal population. Generally, the sampling

distribution will be approximately normal if any of the following conditions apply.

o  The population distribution is normal.

o  The sample data are symmetric, unimodal, without outliers, and the sample size

is 15 or less.

o  The sample data are slightly skewed, unimodal, without outliers, and the sample

size is 16 to 40.

o  The sample size is greater than 40, without outliers.

This approach consists of four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3) 

analyze sample data, and (4) interpret results.

State the Hypotheses

Every hypothesis test requires the analyst to state a null hypothesis and an alternative

hypothesis. The hypotheses are stated in such a way that they are mutually exclusive. That is, if 

one is true, the other must be false; and vice versa.

The table below shows three sets of null and alternative hypotheses. Each makes a statementabout the difference d between the mean of one population 1 and the mean of another

population 2. (In the table, the symbol means " not equal to ".) 

Set Null hypothesis Alternative hypothesis Number of tails

1 1 - 2 = d 1 - 2 d 2

2 1 - 2 > d 1 - 2 < d 1

3 1 - 2 < d 1 - 2 > d 1

The first set of hypotheses (Set 1) is an example of a two-tailed test, since an extreme value oneither side of the sampling distribution would cause a researcher to reject the null hypothesis.

The other two sets of hypotheses (Sets 2 and 3) are one-tailed tests, since an extreme value on

only one side of the sampling distribution would cause a researcher to reject the null

hypothesis.

When the null hypothesis states that there is no difference between the two population means

(i.e., d = 0), the null and alternative hypothesis are often stated in the following form.

H0: 1 = 2 

Ha: 1 2 

Formulate an Analysis Plan

The analysis plan describes how to use sample data to accept or reject the null hypothesis. It

should specify the following elements.

y  Significance level. Often, researchers choose significance levels equal to 0.01, 0.05, or

0.10; but any value between 0 and 1 can be used.

Page 24: phd assign

8/7/2019 phd assign.

http://slidepdf.com/reader/full/phd-assign 24/35

24 

y  Test method. Use the two-sample t-test to determine whether the difference between

means found in the sample is significantly different from the hypothesized difference

between means.

Analyze Sample Data

Using sample data, find the standard error, degrees of freedom, test statistic, and the P-value

associated with the test statistic.

y  Standard error. Compute the standard error (SE) of the sampling distribution.

SE = sqrt[(s12/n1) + (s2

2/n2)]

where s1 is the standard deviation of sample 1, s2 is the standard deviation of sample 2, n1 is the

size of sample 1, and n2

is the size of sample 2.

y  Degrees of freedom. The degrees of freedom (DF) is:

DF = (s12/n1 + s2

2/n2)

2 / { [ (s12 / n1)

2 / (n1 - 1) ] + [ (s22 / n2)

2 / (n2 - 1) ] }

If DF does not compute to an integer, round it off to the nearest whole number. Some texts

suggest that the degrees of freedom can be approximated by the smaller of n1 - 1 and n2 - 1;

but the above formula gives better results.

y  Test statistic. The test statistic is a t-score (t) defined by the following equation.

t = [ (x1 - x2) - d ] / SE

where x1 is the mean of sample 1, x2 is the mean of sample 2, d is the hypothesized difference

between population means, and SE is the standard error.y  P-value. The P-value is the probability of observing a sample statistic as extreme as the

test statistic. Since the test statistic is a t-score, use the t Distribution Calculator to

assess the probability associated with the t-score, having the degrees of freedom

computed above. (See sample problems at the end of this lesson for examples of how

this is done.) 

Interpret Results

If the sample findings are unlikely, given the null hypothesis, the researcher rejects the null

hypothesis. Typically, this involves comparing the P-value to the significance level, and rejectingthe null hypothesis when the P-value is less than the significance level.

Hypothesis Test for Difference BetweenMatched Pairs

This lesson explains how to conduct a hypothesis test for the difference between paired means.

The test procedure, called the matched-pairs t-test, is appropriate when the following

conditions are met:

y  The sampling method for each sample is simple random sampling.

y  The test is conducted on paired data. (As a result, the data sets are not independent.) 

y  Each sample is drawn from a normal or near-normal population. Generally, the sampling

distribution will be approximately normal if any of the following conditions apply.o  The population distribution is normal.

o  The sample data are symmetric, unimodal, without outliers, and the sample size

is 15 or less.

o  The sample data are slightly skewed, unimodal, without outliers, and the sample

size is 16 to 40.

o  The sample size is greater than 40, without outliers.

Page 25: phd assign

8/7/2019 phd assign.

http://slidepdf.com/reader/full/phd-assign 25/35

25

This approach consists of four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3) 

analyze sample data, and (4) interpret results.

State the Hypotheses

Every hypothesis test requires the analyst to state a null hypothesis and an alternativehypothesis. The hypotheses are stated in such a way that they are mutually exclusive. That is, if 

one is true, the other must be false; and vice versa.

The hypotheses concern a new variable d, which is based on the difference between paired

values from two data sets.

d = x1 - x2 

where x1 is the value of variable x in the first data set, and x2 is the value of the variable from

the second data set that is paired with x1.

The table below shows three sets of null and alternative hypotheses. Each makes a statement

about how the true difference in population values d is related to some hypothesized value D.

(In the table, the symbol means " not equal to ".) 

Set Null hypothesis Alternative hypothesis Number of tails

1 d= D d D 2

2 d > D d < D 1

3 d < D d > D 1

The first set of hypotheses (Set 1) is an example of a two-tailed test, since an extreme value on

either side of the sampling distribution would cause a researcher to reject the null hypothesis.

The other two sets of hypotheses (Sets 2 and 3) are one-tailed tests, since an extreme value on

only one side of the sampling distribution would cause a researcher to reject the nullhypothesis.

Formulate an Analysis Plan

The analysis plan describes how to use sample data to accept or reject the null hypothesis. It

should specify the following elements.

y  Significance level. Often, researchers choose significance levels equal to 0.01, 0.05, or

0.10; but any value between 0 and 1 can be used.

y  Test method. Use the matched-pairs t-test to determine whether the difference

between sample means for paired data is significantly different from the hypothesizeddifference between population means.

Analyze Sample Data

Using sample data, find the standard deviation, standard error, degrees of freedom, test

statistic, and the P-value associated with the test statistic.

y  Standard deviation. Compute the standard deviation (sd) of the differences computed

from n matched pairs.

sd = sqrt [ ((di - d)2 / (n - 1) ]

where di is the difference for pair i , d is the sample mean of the differences, and n is the

number of paired values.y  Standard error. Compute the standard error (SE) of the sampling distribution of d.

SE = sd * sqrt{ ( 1/n ) * ( 1 - n/N ) * [ N / ( N - 1 ) ] }

where sd is the standard deviation of the sample difference, N is the population size, and n is

the sample size. When the population size is much larger (at least 10 times larger) than the

sample size, the standard error can be approximated by:

SE = sd / sqrt( n ) 

Page 26: phd assign

8/7/2019 phd assign.

http://slidepdf.com/reader/full/phd-assign 26/35

26

y  Degrees of freedom. The degrees of freedom (DF) is: DF = n - 1 .

y  Test statistic. The test statistic is a t-score (t) defined by the following equation.

t = [ (x1 - x2) - D ] / SE = (d - D) / SE

where x1 is the mean of sample 1, x2 is the mean of sample 2, d is the mean difference between

paired values in the sample, D is the hypothesized difference between population means, and

SE is the standard error.y  P-value. The P-value is the probability of observing a sample statistic as extreme as the

test statistic. Since the test statistic is a t-score, use the t Distribution Calculator to

assess the probability associated with the t-score, having the degrees of freedom

computed above. (See the sample problem at the end of this lesson for guidance on

how this is done.) 

Interpret Results

If the sample findings are unlikely, given the null hypothesis, the researcher rejects the null

hypothesis. Typically, this involves comparing the P-value to the significance level, and rejecting

the null hypothesis when the P-value is less than the significance level.

Hypothesis Test for a Proportion

This lesson explains how to conduct a hypothesis test of a proportion, when the following

conditions are met:

y  The sampling method is simple random sampling.

y  Each sample point can result in just two possible outcomes. We call one of these

outcomes a success and the other, a failure.

y  The sample includes at least 10 successes and 10 failures. (Some texts say that 5

successes and 5 failures are enough.) y  The population size is at least 10 times as big as the sample size.

This approach consists of four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3) 

analyze sample data, and (4) interpret results.

State the Hypotheses

Every hypothesis test requires the analyst to state a null hypothesis and an alternative

hypothesis. The hypotheses are stated in such a way that they are mutually exclusive. That is, if 

one is true, the other must be false; and vice versa.

Formulate an Analysis Plan

The analysis plan describes how to use sample data to accept or reject the null hypothesis. It

should specify the following elements.

y  Significance level. Often, researchers choose significance levels equal to 0.01, 0.05, or

0.10; but any value between 0 and 1 can be used.

y  Test method. Use the one-sample z-test to determine whether the hypothesized

population proportion differs significantly from the observed sample proportion.

Analyze Sample Data

Using sample data, find the test statistic and its associated P-Value.

y  Standard deviation. Compute the standard deviation () of the sampling distribution.

= sqrt[ P * ( 1 - P ) / n ]

where P is the hypothesized value of population proportion in the null hypothesis, and n is the

sample size.

y  Test statistic. The test statistic is a z-score (z) defined by the following equation.

Page 27: phd assign

8/7/2019 phd assign.

http://slidepdf.com/reader/full/phd-assign 27/35

27

z = (p - P) /

where P is the hypothesized value of population proportion in the null hypothesis, p is the

sample proportion, and is the standard deviation of the sampling distribution.

y  P-value. The P-value is the probability of observing a sample statistic as extreme as the

test statistic. Since the test statistic is a z-score, use the Normal Distribution Calculator

to assess the probability associated with the z-score. (See sample problems at the end of this lesson for examples of how this is done.) 

Interpret Results

If the sample findings are unlikely, given the null hypothesis, the researcher rejects the null

hypothesis. Typically, this involves comparing the P-value to the significance level, and rejecting

the null hypothesis when the P-value is less than the significance level.

Hypothesis Tests of Proportions (Small Sample)

In the previous lesson, we showed how to conduct a hypothesis test for a proportion when the

sample included at least 10 successes and 10 failures. This requirement serves two purposes:

y  It guarantees that the sample size will be at least 20 when the proportion is 0.5.

y  It ensures that the minimum acceptable sample size increases as the proportion

becomes more extreme.

When the sample does not include at least 10 successes and 10 failures, the sample size will be

too small to justify the hypothesis testing approach presented in the previous lesson. This

lesson describes how to test a hypothesis about a proportion when the sample size is small, as

long as the sample includes at least one success and one failure. The key steps are:

y  Formulate the hypotheses to be tested. This means stating the null hypothesis and the

alternative hypothesis.y  Determine the sampling distribution of the proportion. If the sample proportion is the

outcome of a binomial experiment, the sampling distribution will be binomial. If it is the

outcome of a hypergeometric experiment, the sampling distribution will be

hypergeometric.

y  Specify the significance level. (Researchers often set the significance level equal to 0.05

or 0.01, although other values may be used.) 

y  Based on the hypotheses, the sampling distribution, and the significance level, define

the region of acceptance.

y  Test the null hypothesis. If the sample proportion falls within the region of acceptance,

accept the null hypothesis; otherwise, reject the null hypothesis.Hypothesis Test for Difference Between Proportions

This lesson explains how to conduct a hypothesis test to determine whether the difference

between two proportions is significant. The test procedure, called the two-proportion z-test, is

appropriate when the following conditions are met:

* The sampling method for each population is simple random sampling.

* The samples are independent.

* Each sample includes at least 10 successes and 10 failures. (Some texts say that 5 successes

and 5 failures are enough.) 

* Each population is at least 10 times as big as its sample.

This approach consists of four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3) 

analyze sample data, and (4) interpret results.

State the Hypotheses

Page 28: phd assign

8/7/2019 phd assign.

http://slidepdf.com/reader/full/phd-assign 28/35

28

Every hypothesis test requires the analyst to state a null hypothesis and an alternative

hypothesis. The table below shows three sets of hypotheses. Each makes a statement about the

difference d between two population proportions, P1 and P2. (In the table, the symbol means

" not equal to ".) 

Set Null hypothesis Alternative hypothesis Number of tails1 P1 - P2 = 0 P1 - P2 0 2

2 P1 - P2 > 0 P1 - P2 < 0 1

3 P1 - P2 < 0 P1 - P2 > 0 1

The first set of hypotheses (Set 1) is an example of a two-tailed test, since an extreme value on

either side of the sampling distribution would cause a researcher to reject the null hypothesis.

The other two sets of hypotheses (Sets 2 and 3) are one-tailed tests, since an extreme value on

only one side of the sampling distribution would cause a researcher to reject the null

hypothesis.

When the null hypothesis states that there is no difference between the two population

proportions (i.e., d = 0), the null and alternative hypothesis for a two-tailed test are often stated

in the following form.

H0: P1 = P2

Ha: P1 P2

Formulate an Analysis Plan

The analysis plan describes how to use sample data to accept or reject the null hypothesis. It

should specify the following elements.

* Significance level. Often, researchers choose significance levels equal to 0.01, 0.05, or 0.10;

but any value between 0 and 1 can be used.

* Test method. Use the two-proportion z-test (described in the next section) to determine

whether the hypothesized difference between population proportions differs significantly from

the observed sample difference.

Analyze Sample Data

Using sample data, complete the following computations to find the test statistic and its

associated P-Value.

* Pooled sample proportion. Since the null hypothesis states that P1=P2, we use a pooled

sample proportion (p) to compute the standard error of the sampling distribution.

p = (p1 * n1 + p2 * n2) / (n1 + n2) 

where p1 is the sample proportion from population 1, p2 is the sample proportion from

population 2, n1 is the size of sample 1, and n2 is the size of sample 2.

* Standard error. Compute the standard error (SE) of the sampling distribution differencebetween two proportions.

SE = sqrt{ p * ( 1 - p ) * [ (1/n1) + (1/n2) ] }

where p is the pooled sample proportion, n1 is the size of sample 1, and n2 is the size of 

sample 2.

Page 29: phd assign

8/7/2019 phd assign.

http://slidepdf.com/reader/full/phd-assign 29/35

29

* Test statistic. The test statistic is a z-score (z) defined by the following equation.

z = (p1 - p2) / SE

where p1 is the proportion from sample 1, p2 is the proportion from sample 2, and SE is the

standard error of the sampling distribution.

* P-value. The P-value is the probability of observing a sample statistic as extreme as the test

statistic. Since the test statistic is a z-score, use the Normal Distribution Calculator to assess the

probability associated with the z-score

The analysis described above is a two-proportion z-test.

Interpret Results

If the sample findings are unlikely, given the null hypothesis, the researcher rejects the null

hypothesis. Typically, this involves comparing the P-value to the significance level, and rejecting

the null hypothesis when the P-value is less than the significance level.

Region of Acceptance

In this lesson, we describe how to find the region of acceptance for a hypothesis test.

One-Tailed and Two-Tailed Hypothesis Tests

The steps taken to define the region of acceptance will vary, depending on whether the null

hypothesis and the alternative hypothesis call for one- or two-tailed hypothesis tests. So we

begin with a brief review.

The table below shows three sets of hypotheses. Each makes a statement about how the

population mean is related to a specified value M. (In the table, the symbol means " notequal to ".) 

Set Null hypothesis Alternative hypothesis Number of tails

1 = M M 2

2 > M < M 1

3 < M > M 1

The first set of hypotheses (Set 1) is an example of a two-tailed test, since an extreme value on

either side of the sampling distribution would cause a researcher to reject the null hypothesis.The other two sets of hypotheses (Sets 2 and 3) are one-tailed tests, since an extreme value on

only one side of the sampling distribution would cause a researcher to reject the null

hypothesis.

How to Find the Region of Acceptance

We define the region of acceptance in such a way that the chance of making a Type I error is

equal to the significance level. Here is how that is done.

y  Define a test statistic. Here, the test statistic is the sample measure used to estimate the

population parameter that appears in the null hypothesis. For example, suppose the null

hypothesis isH0: = M

The test statistic, used to estimate M, would be m. If M were a population mean, m would be

the sample mean; if M were a population proportion, m would be the sample proportion; if M 

were a difference between population means, m would be the difference between sample

means; and so on.

Page 30: phd assign

8/7/2019 phd assign.

http://slidepdf.com/reader/full/phd-assign 30/35

30

y  Given the significance level , find the upper limit (UL) of the region of acceptance.

There are three possibilities, depending on the form of the null hypothesis.

o  If the null hypothesis is < M: The upper limit of the region of acceptance will be

equal to the value for which the cumulative probability of the sampling

distribution is equal to one minus the significance level. That is, P( m < UL ) = 1 -

.o  If the null hypothesis is = M: The upper limit of the region of acceptance will be

equal to the value for which the cumulative probability of the sampling

distribution is equal to one minus the significance level divided by 2. That is, P( m

< UL ) = 1 - /2 .

o  If the null hypothesis is > M: The upper limit of the region of acceptance is

equal to plus infinity, unless the test statistic were a proportion or a percentage.

The upper limit is 1 for a proportion, and 100 for a percentage.

y  In a similar way, we find the lower limit (LL) of the range of acceptance. Again, there are

three possibilities, depending on the form of the null hypothesis.

o  If the null hypothesis is < M: The lower limit of the region of acceptance is

equal to minus infinity, unless the test statistic is a proportion or a percentage.

The lower limit for a proportion or a percentage is zero.

o  If the null hypothesis is = M: The lower limit of the region of acceptance will be

equal to the value for which the cumulative probability of the sampling

distribution is equal to the significance level divided by 2. That is, P( m < LL ) =

/2 .

o  If the null hypothesis is > M: The lower limit of the region of acceptance will be

equal to the value for which the cumulative probability of the sampling

distribution is equal to the significance level. That is, P( m < LL ) = .

The region of acceptance is defined by the range between LL and UL.

Power of a Hypothesis Test

The probability of not committing a Type II error is called the power of a hypothesis test.

Effect Size

To compute the power of the test, one offers an alternative view about the "true" value of the

population parameter, assuming that the null hypothesis is false. The effect size is thedifference between the true value and the value specified in the null hypothesis.

Effect size = True value - Hypothesized value

For example, suppose the null hypothesis states that a population mean is equal to 100. A

researcher might ask: What is the probability of rejecting the null hypothesis if the true

population mean is equal to 90? In this example, the effect size would be 90 - 100, which equals

-10.

Factors That Affect Power

The power of a hypothesis test is affected by three factors.

y  Sample size (n). Other things being equal, the greater the sample size, the greater thepower of the test.

y  Significance level (). The higher the significance level, the higher the power of the test.

If you increase the significance level, you reduce the region of acceptance. As a result,

you are more likely to reject the null hypothesis. This means you are less likely to accept

the null hypothesis when it is false; i.e., less likely to make a Type II error. Hence, the

power of the test is increased.

Page 31: phd assign

8/7/2019 phd assign.

http://slidepdf.com/reader/full/phd-assign 31/35

31

y  The "true" value of the parameter being tested. The greater the difference between the

"true" value of a parameter and the value specified in the null hypothesis, the greater

the power of the test. That is, the greater the effect size, the greater the power of the

test.

How to Compute Power

When a researcher designs a study to test a hypothesis, he/she should compute the power of 

the test (i.e., the likelihood of avoiding a Type II error).

How to Compute the Power of a Hypothesis Test

To compute the power of a hypothesis test, use the following three-step procedure.

y  Define the region of acceptance. Previously, we showed how to compute the region of 

acceptance for a hypothesis test.

y  Specify the critical parameter value. The critical parameter value is an alternative to the

value specified in the null hypothesis. The difference between the critical parameter

value and the value from the null hypothesis is called the effect size. That is, the effect

size is equal to the critical parameter value minus the value from the null hypothesis.

y  Compute power. Assume that the true population parameter is equal to the critical

parameter value, rather than the value specified in the null hypothesis. Based on that

assumption, compute the probability that the sample estimate of the population

parameter will fall outside the region of acceptance. That probability is the power of the

test.

Chi-Square Goodness-of-Fit Test

This lesson explains how to conduct a chi-square goodness of fit test. The test is applied whenyou have one categorical variable from a single population. It is used to determine whether

sample data are consistent with a hypothesized distribution.

For example, suppose a company printed baseball cards. It claimed that 30% of its cards were

rookies; 60%, veterans; and 10%, All-Stars. We could gather a random sample of baseball cards

and use a chi-square goodness of fit test to see whether our sample distribution differed

significantly from the distribution claimed by the company. The sample problem at the end of 

the lesson considers this example.

The test procedure described in this lesson is appropriate when the following conditions are

met:

y The sampling method is simple random sampling.

y  The population is at least 10 times as large as the sample.

y  The variable under study is categorical.

y  The expected value for each level of the variable is at least 5.

This approach consists of four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3) 

analyze sample data, and (4) interpret results.

State the Hypotheses

Every hypothesis test requires the analyst to state a null hypothesis and an alternative

hypothesis. The hypotheses are stated in such a way that they are mutually exclusive. That is, if 

one is true, the other must be false; and vice versa.For a chi-square goodness of fit test, the hypotheses take the following form.

H0: The data are consistent with a specified distribution.

Ha: The data are not consistent with a specified distribution.

Page 32: phd assign

8/7/2019 phd assign.

http://slidepdf.com/reader/full/phd-assign 32/35

32

Typically, the null hypothesis specifies the proportion of observations at each level of the

categorical variable. The alternative hypothesis is that at least one of the specified proportions

is not true.

Formulate an Analysis Plan

The analysis plan describes how to use sample data to accept or reject the null hypothesis. The

plan should specify the following elements.

y  Significance level. Often, researchers choose significance levels equal to 0.01, 0.05, or

0.10; but any value between 0 and 1 can be used.

y  Test method. Use the chi-square goodness of fit test to determine whether observed

sample frequencies differ significantly from expected frequencies specified in the null

hypothesis. The chi-square goodness of fit test is described in the next section, and

demonstrated in the sample problem at the end of this lesson.

Analyze Sample Data

Using sample data, find the degrees of freedom, expected frequency counts, test statistic, and

the P-value associated with the test statistic.

y  Degrees of freedom. The degrees of freedom (DF) is equal to the number of levels (k) of 

the categorical variable minus 1: DF = k - 1 .

y  Expected frequency counts. The expected frequency counts at each level of the

categorical variable are equal to the sample size times the hypothesized proportion

from the null hypothesis

Ei = npi 

where Ei is the expected frequency count for the i th level of the categorical variable, n is the

total sample size, and pi is the hypothesized proportion of observations in level i .y  Test statistic. The test statistic is a chi-square random variable (2

) defined by the

following equation.

2

= [ (Oi - Ei)2 / Ei ]

where Oi is the observed frequency count for the i th level of the categorical variable, and Ei is

the expected frequency count for the i th level of the categorical variable.

y  P-value. The P-value is the probability of observing a sample statistic as extreme as the

test statistic. Since the test statistic is a chi-square, use the Chi-Square Distribution

Calculator to assess the probability associated with the test statistic. Use the degrees of 

freedom computed above.

Interpret Results

If the sample findings are unlikely, given the null hypothesis, the researcher rejects the null

hypothesis. Typically, this involves comparing the P-value to the significance level, and rejecting

the null hypothesis when the P-value is less than the significance level.

Chi-Square Test for Homogeneity

This lesson explains how to conduct a chi-square test of homogeneity. The test is applied to a

single categorical variable from two different populations. It is used to determine whether

frequency counts are distributed identically across different populations.For example, in a survey of TV viewing preferences, we might ask respondents to identify their

favorite program. We might ask the same question of two different populations, such as males

and females. We could use a chi-square test for homogeneity to determine whether male

viewing preferences differed significantly from female viewing preference. The sample problem

at the end of the lesson considers this example.

Page 33: phd assign

8/7/2019 phd assign.

http://slidepdf.com/reader/full/phd-assign 33/35

33

The test procedure described in this lesson is appropriate when the following conditions are

met:

y  For each population, the sampling method is simple random sampling.

y  Each population is at least 10 times as large as its respective sample.

y  The variable under study is categorical.

y  If sample data are displayed in a contingency table (Populations x Category levels), theexpected frequency count for each cell of the table is at least 5.

This approach consists of four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3) 

analyze sample data, and (4) interpret results.

State the Hypotheses

Every hypothesis test requires the analyst to state a null hypothesis and an alternative

hypothesis. The hypotheses are stated in such a way that they are mutually exclusive. That is, if 

one is true, the other must be false; and vice versa.

Suppose that data were sampled from r populations, and assume that the categorical variable

had c levels. At any specified level of the categorical variable, the null hypothesis states that

each population has the same proportion of observations. Thus,

H0: Plevel 1 of population 1 = Plevel 1 of population 2 = . . . = Plevel 1 of population r 

H0: Plevel 2 of population 1 = Plevel 2 of population 2 = . . . = Plevel 2 of population r 

. . .

H0: Plevel c of population 1 = Plevel c of population 2 = . . . = Plevel c of population r 

The alternative hypothesis (Ha) is that at least one of the null hypothesis statements is false.

Formulate an Analysis Plan

The analysis plan describes how to use sample data to accept or reject the null hypothesis. The

plan should specify the following elements.

y  Significance level. Often, researchers choose significance levels equal to 0.01, 0.05, or

0.10; but any value between 0 and 1 can be used.

y  Test method. Use the chi-square test for homogeneity to determine whether observed

sample frequencies differ significantly from expected frequencies specified in the null

hypothesis. The chi-square test for homogeneity is described in the next section.

Analyze Sample Data

Using sample data from the contingency tables, find the degrees of freedom, expectedfrequency counts, test statistic, and the P-value associated with the test statistic. The analysis

described in this section is illustrated in the sample problem at the end of this lesson.

y  Degrees of freedom. The degrees of freedom (DF) is equal to:

DF = (r - 1) * (c - 1) 

where r is the number of populations, and c is the number of levels for the categorical variable.

y  Expected frequency counts. The expected frequency counts are computed separately

for each population at each level of the categorical variable, according to the following

formula.

Er,c = (nr * nc) / n

where Er,c

is the expected frequency count for population r at level c of the categorical variable,

nr is the total number of observations from population r, nc is the total number of observations

at treatment level c, and n is the total sample size.

y  Test statistic. The test statistic is a chi-square random variable (2) defined by the

following equation.

2

= [ (Or,c - Er,c)2 / Er,c ]

Page 34: phd assign

8/7/2019 phd assign.

http://slidepdf.com/reader/full/phd-assign 34/35

34 

where Or,c is the observed frequency count in population r for level c of the categorical variable,

and Er,c is the expected frequency count in population r for level c of the categorical variable.

y  P-value. The P-value is the probability of observing a sample statistic as extreme as the

test statistic. Since the test statistic is a chi-square, use the Chi-Square Distribution

Calculator to assess the probability associated with the test statistic. Use the degrees of 

freedom computed above.

Interpret Results

If the sample findings are unlikely, given the null hypothesis, the researcher rejects the null

hypothesis. Typically, this involves comparing the P-value to the significance level, and rejecting

the null hypothesis when the P-value is less than the significance level.

Chi-Square Test for Independence

This lesson explains how to conduct a chi-square test for independence. The test is applied

when you have two categorical variables from a single population. It is used to determine

whether there is a significant association between the two variables.

For example, in an election survey, voters might be classified by gender (male or female) and

voting preference (Democrat, Republican, or Independent). We could use a chi-square test for

independence to determine whether gender is related to voting preference. The sample

problem at the end of the lesson considers this example.

The test procedure described in this lesson is appropriate when the following conditions are

met:

y  The sampling method is simple random sampling.

y  Each population is at least 10 times as large as its respective sample.

y  The variables under study are each categorical.y  If sample data are displayed in a contingency table, the expected frequency count for

each cell of the table is at least 5.

This approach consists of four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3) 

analyze sample data, and (4) interpret results.

State the Hypotheses

Suppose that Variable A has r levels, and Variable B has c levels. The null hypothesis states that

knowing the level of Variable A does not help you predict the level of Variable B. That is, the

variables are independent.H0: Variable A and Variable B are independent.

Ha: Variable A and Variable B are not independent.

The alternative hypothesis is that knowing the level of Variable A can help you predict the level

of Variable B.

Note: Support for the alternative hypothesis suggests that the variables are related; but the

relationship is not necessarily causal, in the sense that one variable "causes" the other.

Formulate an Analysis Plan

The analysis plan describes how to use sample data to accept or reject the null hypothesis. The

plan should specify the following elements.

y  Significance level. Often, researchers choose significance levels equal to 0.01, 0.05, or

0.10; but any value between 0 and 1 can be used.

y  Test method. Use the chi-square test for independence to determine whether there is a

significant relationship between two categorical variables.

Page 35: phd assign

8/7/2019 phd assign.

http://slidepdf.com/reader/full/phd-assign 35/35

35

Analyze Sample Data

Using sample data, find the degrees of freedom, expected frequencies, test statistic, and the P-

value associated with the test statistic. The approach described in this section is illustrated in

the sample problem at the end of this lesson.

y  Degrees of freedom. The degrees of freedom (DF) is equal to:DF = (r - 1) * (c - 1) 

where r is the number of levels for one catagorical variable, and c is the number of levels for

the other categorical variable.

y  Expected frequencies. The expected frequency counts are computed separately for each

level of one categorical variable at each level of the other categorical variable. Compute

r * c expected frequencies, according to the following formula.

Er,c = (nr * nc) / n

where Er,c is the expected frequency count for level r of Variable A and level c of Variable B, nr is

the total number of sample observations at level r of Variable A, nc is the total number of 

sample observations at level c of Variable B, and n is the total sample size.

y  Test statistic. The test statistic is a chi-square random variable (2) defined by the

following equation.

2 = [ (Or,c - Er,c)2 / Er,c ]

where Or,c is the observed frequency count at level r of Variable A and level c of Variable B, and

Er,c is the expected frequency count at level r of Variable A and level c of Variable B.

y  P-value. The P-value is the probability of observing a sample statistic as extreme as the

test statistic. Since the test statistic is a chi-square, use the Chi-Square Distribution

Calculator to assess the probability associated with the test statistic. Use the degrees of 

freedom computed above.

Interpret Results

If the sample findings are unlikely, given the null hypothesis, the researcher rejects the null

hypothesis. Typically, this involves comparing the P-value to the significance level, and rejecting

the null hypothesis when the P-value is less than the significance level.