54
Chapter 1 Chapter 1 Exploring Data

Stats chapter 1

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Stats chapter 1

Chapter 1Chapter 1

Exploring Data

Page 2: Stats chapter 1

1.1 DISPLAYING DATA WITH GRAPHS

Page 3: Stats chapter 1

Categorical variables

Bar graphs• Recall that horizontal axis is the

category name and the vertical axis is the count or percentage

Create a bar graph for “mobile phone carrier” for the students in this period in class

/start with a survey!

Page 4: Stats chapter 1

Categorical Variables

Pie Chart• the area of each slice of pie reflects the

relative frequency of the category the slice represents– i.e. if “ATT” is used by 25% of the class, the area

of the ATT slice must be 25% of the entire pie

• Remember/ all categories must be represented in the pie

Typically, these are not fun to create

Page 5: Stats chapter 1

Quantitative Data

Stemplot (a.k.a. “Stem and Leaf Plot”)A stemplot displays the distribution in a very meaningful way

Preview the example of pg 43!

Page 6: Stats chapter 1

Quantitative Data

Stemplot steps1. Arrange the observations numerical

order2. Separate each observation into a stem

and a leaf3. Write stems in a vertical column4. Write the leaf of each observation next

to the stem. Leaves that are closest to the stem are lower in numerical value.

Page 7: Stats chapter 1

Quantitative Data

The following measurements are the number of points scored by THS football in each game of the 2009 season.

42, 27, 19, 14, 20, 47, 53, 28, 32, 30, 44, 20

Page 8: Stats chapter 1

Quantitative Data

Stemplot steps1. Arrange the observations numerical

order

14, 19, 20, 20, 27, 28, 30, 32, 42, 44, 47, 53

Page 9: Stats chapter 1

Quantitative Data

Stemplot steps2. Separate each observation into a

stem and a leaf

1/4, 1/9, 2/0, 2/0, 2/7, 2/8, 3/0, 3/2, 4/2, 4/4, 4/7, 5/3

Page 10: Stats chapter 1

Quantitative Data

Stemplot steps3. Write stems in a vertical column

1/4, 1/9, 2/0, 2/0, 2/7, 2/8, 3/0, 3/2, 4/2, 4/4, 4/7, 5/3

1 2 3 45

Page 11: Stats chapter 1

Quantitative Data

4. Write the leaf of each observation next to the stem. Leaves that are closest to the stem are lower in numerical value.1/4, 1/9, 2/0, 2/0, 2/7, 2/8, 3/0, 3/2, 4/2, 4/4, 4/7, 5/3

1 4, 92 0, 0, 7, 83 0, 24 2, 4, 75 3

YAY!

Page 12: Stats chapter 1

Quantitative Data

Histogram• A histogram is similar to a bar graph,

but is used for quantitative data only.• Observations are separated into classes

(number ranges)– All classes must have equal width

• Like a bar graph, the height of each bar represents the count for each class

• Example 1.6 on pg 49

Page 13: Stats chapter 1

Quantitative Data

HistogramLet’s use the same data from our previous example

14, 19, 20, 20, 27, 28, 30, 32, 42, 44, 47, 53

Page 14: Stats chapter 1

Quantitative Data

Histogram1. Separate the range into classes of

equal widthLet’s try the following:00 < score < 1415 < score < 29 30 < score < 44 45 < score < 60

Page 15: Stats chapter 1

Quantitative Data

Histogram2.Count the number of individuals in

each class:Class Count

00 < score < 14 1

15 < score < 29 5

30 < score < 44 4

45 < score < 60 2

Page 16: Stats chapter 1

Quantitative Data

Histogram3. Draw and label each Axis:

COUNT

6

5

4

3

2

1

0 10 20 30 40 50 60

Number of points scored

Page 17: Stats chapter 1

Quantitative Data

Histogram3. Draw each bar to the correct height

COUNT

6

5

4

3

2

1

0 10 20 30 40 50 60 Number of points scored

Page 18: Stats chapter 1

Assignment 1A

• 1.1-1.12 all• Starts on pg 46

Page 19: Stats chapter 1

Examining Distributions

• Look for the pattern and any deviations from the general pattern

• In written work, you must describe C.U.S.S.– Center– Unusual features (outliers)– Shape– Spread

• Note: CUSS is just a mnemonic device. It is customary to discuss “unusual features” last

Page 20: Stats chapter 1

Examining Distributions

Center- We will discuss at greater length later. For now, you can use the median as a measure of centerSpread- Also discussed later. For now, give the minimum and maximum values to describe spread

Page 21: Stats chapter 1

Examining Distributions

Shape- We generally want to know two things1. How many peaks? Is it unimodal (one

distinct peak) or is it uniform (no distinct peaks)?

2. Is the distribution symmetric (both tails are approximately equal) or skewed (one of the tails is longer)Left skewed- left tail is longerRight skewed- right tail is longer

Page 22: Stats chapter 1

Examining Distributions

Outliers- like many things in statistics, outliers can be a judgment call. Although we will learn a customary formula, to determine outliers, to formula is arbitrary.

• In a histogram, outliers will be clearly separated from the rest of the observations

• Because class widths can be arbitrary, be sure to thoroughly examine the data before classifying an observation as an outlier.

• Do not ignore or delete outlier observations!

Page 23: Stats chapter 1

Relative Freq. and Cumulative Freq.

Let’s return to THS Football ‘09

Class Count

00 < score < 14 1

15 < score < 29 5

30 < score < 44 4

45 < score < 60 2

Page 24: Stats chapter 1

Relative Freq. and Cumulative Freq.

We will add a column to show relative frequency

Yes, “relative frequency” is the same thing as “percentage”At this point, you could make a histogram using relative frequencies, if desired.

Score Count Rel. Freq. (%)

00 to 14 1 8

15 to 29 5 42

30 to 44 4 33

45 to 60 2 17

Page 25: Stats chapter 1

Relative Freq. and Cumulative Freq.

Now add a column to show cumulative frequency

Yes, keep adding the next rel. freq.The last cell in the column should be 100, unless there is roundoff error (not a big deal)

Score Count Rel. Freq. (%)

Cum. Freq.

00 to 14 1 8 8

15 to 29 5 42 50

30 to 44 4 33 83

45 to 60 2 17 100

Page 26: Stats chapter 1

Relative Freq. and Cumulative Freq.

To create a “Cumulative Frequency Plot” or “Ogive” start by creating axes similar to a histogramThe vertical axis is percentage and should be labeled 0 to 100%

Cum

ula

tive

freq

. (%

)

100

80

60

40

20

0 10 20 30 40 50 60

Number of points scored

Page 27: Stats chapter 1

Relative Freq. and Cumulative Freq.

Plot points for each Cum. Freq. The left boundary of the first class should be plotted at zero. The last point plotted will be the right boundary of the last class at 100%

Cum

ula

tive

freq

. (%

)

100

80

60

40

20

0 10 20 30 40 50 60

Number of points scored

Page 28: Stats chapter 1

Relative Freq. and Cumulative Freq.

CONNECT THE DOTS!C

um

ula

tive

freq

. (%

)

100

80

60

40

20

0 10 20 30 40 50 60

Number of points scored

Page 29: Stats chapter 1

Relative Freq. and Cumulative Freq.

Some notes about ogives.• It’s pronounced “Oh-Jives”• Ogives can be used to find approx. percentile

rank– The vertical axis is percentile!

• In particular, we are interested in:– Median (50th percentile)– First Quartile (25th percentile)– Third Quartile (75th percentile)

The above vocab. Will come up again. Memorize it!

Page 30: Stats chapter 1

Assignment 1.1B

• P 64 #13-15, 21-25

Page 31: Stats chapter 1

1.2 DESCRIBING DATA WITH NUMB3RS

Page 32: Stats chapter 1

Measuring Center

• MEAN- calculated the same way you always calculate mean (average)

• The symbol is read as “x-bar”• The mean is affected by not a resistant

measure of center- it is sensitive to a few extreme observations.

1 2 ... n

i

x x xx

nx

xn

x

Page 33: Stats chapter 1

Measuring Center

• Median- the “middle” number in a set of observations is known as the median

• If the data set has an even number of observations, then the median is the average of the two middle numbers

• Unlike the mean, the median is a resistant measure of center.

Page 34: Stats chapter 1

Measuring Spread

The Quartiles• The median of the subset of data less than the

median is the First Quartile (Q1)

• The median of the subset of data greater than the median is the Third Quartile (Q3)

Notice that the median is not included in either of the above calculations Q1 is the 25th percentile

Q3 is the 75th percentile

Page 35: Stats chapter 1

Measuring Spread

Recall the data from THS Football 200914, 19, 20, 20, 27, 28, 30, 32, 42, 44,

47, 53

We can order the numbers to help01, 02, 03, 04, 05, 06, 07, 08, 09, 10,

11, 1214, 19, 20, 20, 27, 28, 30, 32, 42, 44,

47, 53

Page 36: Stats chapter 1

Measuring Spread

01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12

14, 19, 20, 20, 27, 28, 30, 32, 42, 44, 47, 53

Notice that the median is the average of 28 and 30Med. = 29

Page 37: Stats chapter 1

Measuring Spread

01, 02, 03, 04, 05, 06, Q1 is the avg14, 19, 20, 20, 27, 28, of 20 and 20

07, 08, 09, 10, 11, 12 Q3 is the avg.30, 32, 42, 44, 47, 53 of 42 and 44

Q1 = 20Med. = 29Q3 = 43

Page 38: Stats chapter 1

Measuring Spreat

InterQuartile Range (IQR)IQR is the preferred measurement of spread when the median is used to describe centerIQR = Q3 - Q1

IQR = 43 – 20

IQR = 23

Page 39: Stats chapter 1

Measuring Spread

InterQuartile Range and OutliersThe previously mentioned formula for determining outlier observations depends on IQRHigh outliers (outliers to the right) measurements greater than Q3 + 1.5 x IQRLow Outliers (outliers to the left) measurements less than: Q1 - 1.5 x IQR

Page 40: Stats chapter 1

Measuring Spread

InterQuartile Range and OutliersHigh outliers

greater than Q3 + 1.5 x IQR = 43 + 1.5 x 23

or any observation greater than 77.5Low Outliers

less than: Q1 - 1.5 x IQR = 20 – 1.5 x 23

or observations less than -14.5Clearly, THS had no outlier football scores in

2009!

Page 41: Stats chapter 1

Five Number Summary

A snapshot of a data distribution can be given with the 5 number summary:

Minimum, Q1, Median, Q3, Maximum

For our THS Football 2009, the five number summary is:

14, 20, 29, 43, 53

Page 42: Stats chapter 1

Five Number Summary

The 5 number summary is used to create a box plot (“box and whiskers” plot)

0 10 20 30 40 50 60

Min Q1 Med Q3 Max

Page 43: Stats chapter 1

Five Number Summary

BOX PLOT• a number line must be included with a

box plot• outliers appear as unconnected dots

0 10 20 30 40 50 60

Page 44: Stats chapter 1

Assignment 1C

• P74 #27-30, 32, 34, 37

Page 45: Stats chapter 1

The Standard Deviation

The preferred measure of spread when using mean as a measure of center is the related measurements of “variance” and “standard deviation”variance = s2

standard deviation = s

Yes, standard deviation is the square root of variance.

Page 46: Stats chapter 1

The Standard Deviation

Formulation of variance

Yes, take the square root to find the std. dev.

2 2 2

1 22

22

...

1

1

1

n

i

x x x x x xs

n

s x xn

Page 47: Stats chapter 1

The Standard Deviation

For the THS 2009 dataMean = 31.33s2 = [(14-31.33)2+(19-31.33)2+(20-31.33)2+

(20-31.33)2+(27-31.33)2+(28-31.33)2+(30-31.33)2+(32-31.33)2+(42-31.33)2+(44-31.33)2+(47-31.33)2+(53-31.33)2] / (12-1)

s2 = 1730.66 / 11s2 = 157.33

Page 48: Stats chapter 1

The Standard Deviation

• Notice that the number s2 = 157.33 doesn’t really have much to do with the data set!

• However we can see that s = 12.54 has some meaning in our data.

• With all data sets, “the majority” of observations are within the standard deviation of the meanMost data is btwn 31.33 - 12.54 and 31.33 +

12.54-or- Most data is btwn 18.79 and 43.87

Page 49: Stats chapter 1

Which measurements do I choose?

• Use “mean and standard deviation” when the data is reasonably symmetric with no outliers.

• Use “median and IQR” or 5 num. sum. in cases where the “mean and std. dev.” is not appropriate.

• Remember: “5 num sum” is resistant to outliers, while the “mean and std dev” is not resistant

Page 50: Stats chapter 1

Linear Transformation of Data

• If every member of a data set is multiplied by a positive number b, then the measures of center and spread are also multiplied by b.

• If a constant a is added to every member of a data set, then a is added to the measure center, but the measures of spread remain unchanged.

Page 51: Stats chapter 1

Linear Transformation of Data

Measurement OLD DATA TRANSFORMED DATA

Observation x a + b*x

Mean a + b*

Std. dev. s b*s

Median Med a + b*med

InterQuart. Range

IQR b*IQR

Page 52: Stats chapter 1

Comparing Data Sets

• The AP Exam always asks students to compare data.

• Clearly identify the populations that are being compared

• Make sure to compare each of CUSS • Make reference to the measurement you are

comparing– i.e. use “mean” and not “center”

• Give the values of the measurements you are comparing.

• Make use of comparison phrases “is greater than” “is less than”

Page 53: Stats chapter 1

Assignment 1D

• P89 #39-41, 45-47

Page 54: Stats chapter 1