46
Chapter 9 Statistics

Chapter 9

Embed Size (px)

DESCRIPTION

Chapter 9. Statistics. Frequency Distributions; Measures of Central Tendency. Frequency Distributions Three types of frequency distributions: Categorical – primarily for nominal, ordinal level data (FYI) Grouped – range of data is large - PowerPoint PPT Presentation

Citation preview

Page 1: Chapter 9

Chapter 9

Statistics

Page 2: Chapter 9

Frequency Distributions; Measures of Central Tendency Frequency Distributions

Three types of frequency distributions: Categorical – primarily for nominal, ordinal level

data (FYI) Grouped – range of data is large Ungrouped – range of data is small, single data

values for each class (FYI)

Page 3: Chapter 9

Frequency Distributions; Measures of Central Tendency Grouped Frequency Distributions

Step 1: Order data from smallest to largest Step 2: Determine the number of classes (e.g. class

intervals) using Sturges’ Rule k=1+3.322(log10n) where n is the number of observations (data values). *Always round up

Class intervals are contiguous, nonoverlapping intervals selected in such a way that they are mutually exclusive and exhaustive. That is, each and every value in the set of data can be placed in one, and only one, of the intervals.

Page 4: Chapter 9

Frequency Distributions; Measures of Central Tendency Grouped Frequency Distributions

Step 3: Determine width of class intervals Width (W) = Range (R)

k

where Range= largest value-smallest value

k represents Sturges’ Rule

Page 5: Chapter 9

Frequency Distributions; Measures of Central Tendency Grouped Frequency Distributions

Step 4: Assign observations to class intervals The count in each class interval represents the

frequency for that interval. The smallest observation serves as the first lower class

limit (LCL). Add the ‘width minus one’ to the LCL to get UCL (upper class limit)

NOTE: Technically, class limits (i.e., 0-5, 6-11, 12-17 and so on) are not adjacent.

However, class boundaries account for the space between the class limit intervals (i.e., 0.5 – 5.5, 5.5-11.5, 11.5-17.5 and so on). Boundaries are written for convenience but understood to mean all values up to but not including the upper boundary.

Page 6: Chapter 9

Frequency Distributions; Measures of Central Tendency Grouped Frequency Distributions

Step 5: Calculate cumulative & relative frequencies Cumulative Frequency-Add number of observations from the

first interval through the preceding interval, inclusive. Relative Frequency – Divide number of observations in each

class interval by the total number of observations Cumulative Relative Frequency-Same calculation as cum-

ulative frequency, but using the relative frequencies A Frequency Distribution Table

Class Int. Freq. Cum. Freq. Rel. Freq. Cum. Rel. Freq.

LCL - UCL

Page 7: Chapter 9

Frequency Distributions; Measures of Central Tendency

Measures of Central Tendency – the

value(s) the data tends to center aroundArithmetic mean (average)ModeMedian

Page 8: Chapter 9

Frequency Distributions; Measures of Central Tendency Measures of Central Tendency

Arithmetic mean (sample mean or sample average) --“x-bar” Ungrouped data (individual data such as 5, 6, 10, 14, etc. _

x = xi

n _

x = x1 + x2 + x3 +… + xn

n where xi is each data value (observation) in the data set. where n is the number of observations in the data set

Page 9: Chapter 9

Frequency Distributions; Measures of Central Tendency Calculate the sample mean for ungrouped

data:Step 1: add all values in a data setStep 2: divide the total by the number of

values summed.

Page 10: Chapter 9

Frequency Distributions; Measures of Central Tendency Example

7.0 6.2 7.7 8.0 6.4 6.2 7.2 5.4 6.4 6.57.2 5.4

n = 12 *This is ungrouped data _ x = 7.0+6.2+7.7+8.0+6.4+6.2+7.2+5.4+6.4+6.5+7.2+5.4

12 = 79.6 12 = 6.63

Page 11: Chapter 9

Frequency Distributions; Measures of Central Tendency

Grouped data (assumes each value (observation) falling within a given class interval is equal to the value of the midpoint of that interval

_

x = fi xi

n where xi represents each class interval midpoint (class mark)*

*an easy way to determine the class mark is to simply add the upper class limit (boundary) to the lower class limit (boundary) then divide by 2.

Page 12: Chapter 9

Frequency Distributions; Measures of Central Tendency Calculate the sample mean for grouped

data:Step 1: multiply each class mark by its

corresponding frequencyStep 2: add the resulting productsStep 3: divide the total by the number of

observations

Page 13: Chapter 9

Frequency Distributions; Measures of Central Tendency

Example Class Limits Frequency Class Mark xI fI

90 – 98 6 (see note below) 94 56499-107 22 103 2266108-116 43 112 4816117-125 28 121 3388126-134 9 130 1170

108 12204 _ x = 12204 = 113 108

Note: Where did the number 6 come from? There are 6 data values (observations) in the data set that fall between the range 90-98 (inclusive)

Page 14: Chapter 9

Frequency Distributions; Measures of Central Tendency Measures of Central Tendency

Mode – value that occurs most frequently Ungrouped data

Step 1: identify the data value that occurs most frequently Bi-modal -two values occurring at the same frequency No mode – all values different (not same as mode=0)

Grouped data Step 1: specify the modal class (i.e., the class interval

containing the largest number of observations

Page 15: Chapter 9

Frequency Distributions; Measures of Central Tendency

For ungrouped data <mode>7.0 6.2 7.7 8.0 6.4 6.2 7.2 5.4 6.4

6.5 7.2 5.4 There are four numbers that appear two times

each: 5.4 6.2 6.4 7.2 Therefore there are four

modes. The data set is quad-modal

Page 16: Chapter 9

Frequency Distributions; Measures of Central Tendency For grouped data <modal class>

The modal class: 108-116 or 3rd class (The class with the largest number of data values)

Page 17: Chapter 9

Frequency Distributions; Measures of Central Tendency Measures of Central Tendency

Median – The value above which half the values in a data set lie and below which the other half lie. (The middle value)

Ungrouped Data Step 1: arrange the values in order of magnitude

(smallest to largest) Step 2: locate the middle value

Page 18: Chapter 9

Frequency Distributions; Measures of Central Tendency For ungrouped data <median> 5.4 5.4 6.2 6.2 6.4 6.4 6.5 7.0 7.2 7.2 7.78.0

Even number of values therefore we must get an average of the middle two values

6.4 + 6.5 = 6.45

2

Page 19: Chapter 9

Measures of Variation (Dispersion)

Range (R) (for ungrouped data only)

Ungrouped data Step 1: Take the difference between the largest and

smallest values in a data set. For example, a data set such as 5, 6, 10, 14 has a range of 9 because 14 (the largest value) minus 5 (the smallest value) is 9.

Page 20: Chapter 9

Measures of Variation (Dispersion)

Deviations from the MeanDifferences found by subtracting the mean

from each number in a sample Given 3, 5, 2, 6

The mean ( ) is 4 The deviations from the mean would be -1, 1, -2, 2

x

Page 21: Chapter 9

Measures of Variation (Dispersion)

Variance (s2) - an average of the squares of the deviations of the individual values from their mean.

Ungrouped data

s2 = (xi – )2

n-1

x

Page 22: Chapter 9

Measures of Variation (Dispersion)

Standard deviation (s) Step 1: Calculate the sample standard deviation

for grouped or ungrouped data by: taking the square root of the variance

Page 23: Chapter 9

Measures of Variation (Dispersion) Example 8 6 3 0 0 5 9 2 1 3 7 10 0 3 6

_ *This is ungrouped data x = 4.2 n = 15

(a) Range (R) = 10 – 0 = 10

(b) variance (s2) = (8-4.2)2 + (6-4.2)2 + (3-4.2)2 + (0-4.2)2 + (0-4.2)2 + (5-4.2)2 + (9-4.2)2 + (2-4.2)2 + (1-4.2)2 + (3-4.2)2 + (7-4.2)2 + (10-

4.2)2 + (0-4.2)2 +(3-4.2)2 + (6-4.2)2 _________

15-1 = 158.40__

14 = 11.31

(c) standard deviation (s) = the square root of 11.31 = 3.36

Page 24: Chapter 9

Measures of Variation (Dispersion)

Grouped data

s2 = n ( xi2 fi) - (xi fi)2

n(n-1)

where xi represents each class boundary (or limit) midpoint (class mark)*

where fi represents each class frequency

*an easy way to determine the class mark is to simply add the upper class limit (boundary) to the lower class limit

(boundary) then divide by 2.

Page 25: Chapter 9

Measures of Variation (Dispersion)

Calculate the sample variance for grouped data: Step 1: multiply each squared class mark by its

corresponding frequency Step 2: add the resulting products Step 3: multiply the sum by n [A] Step 4: multiply each class mark by its corresponding

frequency Step 5: add the resulting products Step 6 :square the sum [B] Step 7: perform subtraction [C] = [A] – [B] Step 8: divide [C] by n(n-1)

Page 26: Chapter 9

Measures of Variation (Dispersion) Example Class limits freq(fi) xi xifi xi

2fi

90 – 98 6 94 564 (946) 53,016 [(942)6]

99-107 22 103 2266 233,398

108-116 43 112 4816 539,392

117-125 28 121 3388 409,948

126-134 9 130 1170 152,100

108 12204 1,387,854

Page 27: Chapter 9

Measures of Variation (Dispersion)

Refer to the formula for variance of grouped data below and see if you can fill in the formula using values from the table on the previous slide.

s2 = n ( xi2 fi) - (xi fi)2

n(n-1)

Page 28: Chapter 9

Measures of Variation (Dispersion)

s2 = 108(1,387,854) – (12,204)2

108(107) = 149,888,232.0 - 148,937,616.0 11,556 = 950,616 11,556 = 82.26 Therefore s = 9.07

Page 29: Chapter 9

The Normal Distribution

The Normal Distribution

Also known as the “bell-shaped” curve Some statisticians say it is the most important

distribution in statistics Most popular distribution in statistics

f(x)

x

Page 30: Chapter 9

The Normal Distribution

The normal density function is given by

where ∏≈ 3.142 and ex ≈ 2.718

1

e

(x - )

2f(x) =

Page 31: Chapter 9

The Normal Distribution

Properties of the Normal Distribution

- symmetrical about mean; - mean = median = mode

- area under the curve = 1- each different and specifies different normal

distribution, thus the normal distribution is really a family of distributions- a very important member of the family is

the standard normal distribution

Page 32: Chapter 9

The Normal Distribution

The Standard Normal Distribution

has mean (μ) = 0has standard deviation (σ) = 1 the normal density function reduces to

1

e

2f(z) =

z

Page 33: Chapter 9

The Normal Distribution

The probability that z lies between any two points on the z-axis is determined by the area bounded by perpendiculars erected at each of the points, the curve, and the horizontal axis.

f(z)

a bz

P(a <z< b)

Page 34: Chapter 9

The Normal Distribution

Generally we find the area under the curve for a continuous distribution via calculus by integrating the function between a & b.

1

e

2z

a

b

dz

Page 35: Chapter 9

The Normal Distribution

However, we don't have to integrate because we have a table that has calculated this areaSee TABLE 1 of Appendix A-2

Page 36: Chapter 9

The Normal Distribution

Exercises 6-3 #7 p. 282 Find the area under the normal distribution curve

between z = 0 and z = 0.56 So, we want P (0 < z < 0.56) From the standard normal table we find that

P (0 < z < 0.56) = 0.2123 f(z)

a bz

where a = 0 and b = 0.56

Page 37: Chapter 9

The Normal Distribution

Exercises 6-3 #16 p. 283 Find the area under the normal distribution curve

between z = -0.87 and z = -0.21 So we want P(-0.87 < z < -0.21)

a b 0

where a = -0.87 and b =-0.21

Page 38: Chapter 9

The Normal Distribution

Exercises 6-3 #16 p. 283 con’t

The table gives a probability of 0.3078 at z = 0.87 (note area same for negative or positive z since distribution is symmetrical). This area covers values of z from 0 out to -.87. Since we don’t want that entire area we subtract the area from 0 out to -.21. That is , we subtract .0832 which is the area under the curve at z = 0.21 So 0.3078 – 0.0832 = 0.2246

Page 39: Chapter 9

The Normal Distribution

Exercises 6-3 #25 p. 283 Find the area under the normal distribution curve

to the right of z = 1.92 and to the left of

z = -0.44 So we want P(z >1.92) P(z < -0.44) = 0.3574

where a = -0.44 and b = 1.92

a 0 b

Page 40: Chapter 9

The Normal Distribution

Exercises 6-3 #25 p. 283 Con’t

Since the area at z = .44 is 0.1700 which is the area under the curve from 0 out to 0.44, the remaining area of interest has to be 0.5 – 0.1700 = 0.3300.

AND

Since the area at z = 1.92 is .4726 which is the area under the curve from 0 out to 1.92, the remaining area of interest has to be 0.5 – 0.4726 = 0.0274. So the combined areas of interest are

0.3300 + 0.0274 = 0.3574

Page 41: Chapter 9

The Normal Distribution

Exercises 6-3 #45 z = ?

Given that the shaded area is 0.8962, what would be the value of z? z has to be equal to -1.26. Since the area from 0 out to z is equal to 0.3962

(0.8962 - 0.5000) Recall that one-half of the area under the curve is .5. If we look in the body of the standard normal table for an area of 0.3962 we find that value at the intersection of the 13th row and 7th column which corresponds to a z value of 1.26. Since z is located to the left of 0 it has to be negative, hence – 1.26.

0.8962

z 0

Page 42: Chapter 9

The Normal Distribution

Section 6-4 Applications of the Normal Distribution

To solve problems for a normally distributed variable with a 0 or 1 we MUST transform the variable to a standard normal variable, that is

P(x1 < X < x2) becomes P(z1 < Z < z2) which

allows us to use the standard normal table. Using z = value – mean = x -

standard dev.

Page 43: Chapter 9

The Normal Distribution

Example A survey found that people keep their television sets an average of 4.8 years.

The standard deviation is 0.89 year. If a person decides to buy a new TV set, find the probability that he or she has owned the set for the following amount of time. Assume the variable is normally distributed.

Less than 2.5 years Between 3 and 4 years More than 4.2 years

= 4.8 = 0.89

(a) P(x < 2.5) becomes P(z<-2.58) because z = (2.5 – 4.8)/ 0.89 = -2.58

The area under the curve at Z=2.58 is 0.4951 therefore the P(z<-2.58) = 0.5 – 0.4951 = 0.0049

-2.58 0

Page 44: Chapter 9

The Normal Distribution

(b) P(3 < X < 4) becomes P(-2.02 < z < -0.9) because z = (3-4.8)/ .89 = -2.02 and z=(4-4.8)/.89 = -0.90

from the standard normal table at a z of 2.02 we get .4783 and at a z of .9 we get .3159 so the P(-2.02 < z < -0.9) = .4783 - .3159 = .1624

-2.02 -.9 0

Page 45: Chapter 9

The Normal Distribution

(c) P (x > 4.2) becomes P(z > -0.67) because z = (4.2-4.8)/.89 = -0.67 from the standard normal table at z of .67 we get .2486 so the P(z > -

0.67) = 0.2486 + 0.5 = 0.7486

-.67 0

Page 46: Chapter 9

The Normal Distribution

Review Exercises #9 Area (%age) = .5 = 100 = 15

We can find the X values that correspond to the z values by using the same transformation equation.

-0.67 = (x – 100)/15 and 0.67 = (x -100)/15

15(-.67) = x – 100 15(.67) = x - 100

x = 89.95 x = 110.05

therefore the highest and lowest scores are in the range (89.95 < x < 110.05)

-.67 0 .67