+ CHAPTER 2 Descriptive Statistics SECTION 2.1 FREQUENCY DISTRIBUTIONS

Preview:

Citation preview

+

CHAPTER 2 Descriptive StatisticsSECTION 2.1 FREQUENCY DISTRIBUTIONS

+Section 2.1: Frequency Distributions and Their Graphs

GOAL: explore many ways to organize and describe a data setCenter, variability (or spread), and

shape

+FREQUENCY DISTRIBUTION

A table that shows classes or intervals of data entries with a count of the number of entries in each class. The frequency f of a class is the number of data entries in the class.

Frequency – how often

Distribution – how spread out/concentrated

Example: Pg. 40

+Example of a Frequency DistributionClass Frequency, f

1 – 5 5

6 – 10 8

11 – 15 6

16 – 20 8

21 - 25 5

26 – 30 4

Lower Class Limit – least number that can belong to a classUpper Class Limit – greatest number that can belong to a classClass Width – the distance between lower (or upper) limits of consecutive classesRange – difference between the maximum and minimum data entries

+Guidelines for Creating a Frequency Distribution

1. Determine the range of the data.

2. Determine the number of classes to use.

3. Determine the class width.

4. Find Class Limits.

5. Find the Class Midpoints.

6. Find the Class Boundaries.

7. Tally up the data in each class.

8. Get the FREQUENCY for each class.

+Definitions – Additional Features of Frequency Distributions Class Midpoint – Sum of the lower and upper limits of a

class divided by two (also known as class mark)

Relative Frequency – portion or percentage of the data that falls in that class. Take the frequency (f) divided by the sample size (n).

Cumulative Frequency – sum of the frequency for that class and all previous classes. The cumulative frequency of the last class is equal to the sample size n

+Class Example 1 Page 41

+Class Activity/HW

Pg. 51 #27, #28

We’ll be using these frequency distributions again, so make sure to hold onto them.

HAVE DONE FOR TOMORROW, WE NEED THEM!

DO ON SEPARATE PAPER

+Graphs of Frequency DistributionsFrequency Histogram – a bar graph the

represents the frequency distribution of a data set

Properties of a Frequency Histogram1. The horizontal scale is quantitative and

measures the data values2. The vertical scale measures the frequencies

of the classes3. Consecutive bars MUST touch

+Other Types of Graphs

FREQUENCY POLYGON A line graph that emphasizes the continuous

change in frequencies

RELATIVE FREQUENCY HISTOGRAM Has the same shape/horizontal scale as

frequency histogram Vertical scale measures RELATIVE

frequencies

CUMULATIVE FREQUENCY GRAPH (OGIVE) Line graph that displays the cumulative

frequency of each class at its upper class boundary

+#27: Newspaper Reading Times (min)

Class Frequency

Mid-point Relative f Cumulative f

0 – 7 8 3.5 0.32 8

8 – 15 8 11.5 0.32 16

16 – 23 3 19.5 0.12 19

24 – 31 3 27.5 0.12 22

32 – 39 3 35.5 0.12 25

n = 25

+Class Activity/HW Using Frequency Distribution you created for #28 from

page 51 complete the following:

ON GRAPH PAPER:1. Frequency Histogram

2. Frequency Polygon

3. Relative Frequency Histogram

4. Ogive

**MAKE SURE TO LABEL GRAPHS AND WRITE NEATLY!

(TURN IN WITH FREQUENCY DISTRIBUTION FOR WRITTEN FEEDBACK)

DUE TOMORROW!!!!

+#28 Book Spending Per Semester ($)

Class Frequency

Mid-Point Relative f Cumulative f

30 – 113 5 71.5 0.1724 5

114 – 197 7 155.5 0.2414 12

198 – 281 8 239.5 0.2759 20

282 – 365 2 323.5 0.0690 22

366 – 449 3 407.5 0.1034 25

450 – 533 4 491.5 0.1379 29

n = 29

+Pirate Baseball Activity: Due Given: Pittsburgh Pirates Home Run Data 1961 – 2009

Using this data, create the following: USING EIGHT CLASSES1. Frequency Distribution (including ALL parts and rel./cum. freq)

2. Frequency Histogram

3. Frequency Polygon

4. Relative Frequency Histogram

5. Ogive

Must include: Title, Axis Labels, equal class widths Evidence of ALL calculations (class widths, boundaries, midpoints) Straight lines Neatness Straight Edge Graph Paper

Then, using your phone or an iPad look up homerun data for 2010, 2011, 2012, 2013, 2014, and 2015. Create a NEW Frequency Distribution Two New Charts Explain how this new data has changed the distribution (one paragraph)

THIS WILL BE GRADED.Due:

Only given TODAY and TOMORROW to work in class.

+

Section 2.2: More Graphs and Displays

+Stem and Leaf PlotDisplay for quantitative data

Give the feel of a histogram while retaining data values

Easy way to sort data

Stem – the entry’s leftmost digits

Leaf – the entry’s rightmost digits

Example 1 and 2 on Pages 55 – 56 Ordered/Unordered MUST ALWAYS INCLUDE A KEY!

+Dot PlotEach data entry is plotted, using a point, above a horizontal axis

Can see how data is distributed, see specific data entries, and identify unusual data values

Example 3 Pg. 57

+Graphing Qualitative Data Sets: Pie ChartsA circle that is divided into sectors that represent categories

Area of each sector is proportional to the category’s frequency

KEY: To find central angle: MULTIPLY RELATIVE FREQUENCY BY 360°

+Pareto ChartA vertical bar graph where the height represents frequency or relative frequency

BARS ARE POSITIONED IN ORDER OF HIGHEST TO LOWEST

REMEMBER: Qualitative Data

Example 5 Page 59

+Graphing Paired Data Sets: Scatter PlotPaired Data Sets: one data set corresponds to one entry in a second data set

Scatter Plot: ordered pairs are graphed as points in a coordinate plane

Use to SHOW THE RELATIONSHIP BETWEEN TWO QUANTITATIVE VARIABLES

Example 6 Page 60

+Time Series ChartUsed to graph a time series

Time series – data set composed of quantitative entries taken at regular intervals over a period of time

Example 7 Page 61Scatter Plot: No LineTime Series Chart: Connected data points

+GRADED ASSSIGNMENT:

Individually, complete the following graphs from pages 64 – 65. #18, #20, #22, #24, #25, #29, #30Must be handed in by the beginning of

class on ________ (only ______to work in class)

Will be graded for correctness and neatness

Use graph paper, ruler, protractor, and compass!

+

Section 2.3 - Measures of Central Tendency

+Measures of Central TendencyMEAN, MEDIAN, MODE

Value that represents TYPICAL, or CENTRAL entry of the data

+Mean

Population Mean

μ= Σx /N

Sample Mean

x = Σx / n

N = number of entries in a population

n = number of entries in a sample

+Example 1 Pg. 67

The prices (in dollars) for a sample of roundtrip flights from Chicago, Illinois to Cancun, Mexico are listed. What is the mean price of the flights?

872 432 397 427 388 782 397

WHEN CALCULATING GO ONE DECIMAL FURTHER THAN ORIGINAL DATA

+Median

Value that lies in the middle of the data when the data is ORDEREDIf data set has an even number of entries, the median is the mean of the two middle data entries

Median divides a data set into TWO equal partsEX: 4 5 6 8 10 14

+ModeMost frequently occurring data point

If ALL occur only ONCE, then there is NO MODE

If two data entries occur the same number of times, then BOTH are modes and we have a BIMODAL DISTRIBUTION

If more than two modes, we have a MULITMODAL DISTRIBUTION

+Note on ModeMode is only measure of central tendency that MUST be an actual data point.

+Outlier

Data point that is far away from all of the other data points

+Assignment: Part 1 Section 2.3

Pg. 75 – 78 #18 - #34 even

Finding mean, median, and mode.

Label any outliers.

Use correct notation for mean.

(population mean vs. sample mean)

+Today’s Question: How can we describe the “middle” of unequal data?

You have $200 for 17 days, $300 for 5 days, and $150 dollars for 9 days out of a month. What was your average amount of money for the month?

+Weighted Mean

A mean where each data point in not “worth” the same amount.

Entries have varying “weights”.

x = Σ(x * w) / Σw

**Where w is the weight of each entry

+Example: Weighted Mean Vs. Regular Mean

Tests are worth 50% of overall grade, quizzes 30% and homework 20%.

You get 100 in HW, 90 on a quiz, and 80 on a test.

Calculate regular and weighted mean.

Why is one lower than the other?

+Example: Weighted Mean Vs. Regular Mean

You have $200 for 17 days, $300 for 5 days, and $150 dollars for 9 out of a month.

Calculate regular and weighted mean.

Why is one lower than the other?

+Mean of a Frequency Distribution

x = Σ(x * f) / n

Where n = Σf,

x is the class midpoint,

and f is the frequency

of each class

+Guidelines: Finding the Mean of a Frequency Distribution (Pg. 72) Find the midpoint of each class.

Find the sum of the products of the midpoints and the frequencies.

Σ(x *f )

Find the sum of the frequencies.n = Σf

Find the mean of the frequency distribution.

x = Σ(x * f) / n

+The Shape of Distributions (Pg. 73)Symmetric – can be folded in the middle

Uniform – Rectangular, equal frequencies

Multimodal – More than one peak

Skewed – a “long tail” on one side Direction of the skew is the side the tail is

on. Left skewed means the tail is on the left

side Right skewed means the tail in on the right

side

+EXAMPLES: Page 73

Mean describes data best when data is symmetric.

Median describes data best when data is skewed or contains outliers.

Mode describes data best when data is nominal level of measurement.

+Assignment: Part 2 Section 2.3

Pg. 77 – 78 #41-#44, #46 - #48, #52- #54

THIS IS A LENGTHY ASSIGNMENT, GET STARTED ON IT!!!

+

Section 2.4: Measures of Variation

+Find the mean, median, and mode.

SET A: 37, 38, 39, 41, 41,41, 42, 44, 45, 47

SET B: 23, 29, 32, 40, 41, 41, 48, 50, 52, 59

+

+Measures of Variation:Range, Deviation, Variance, Standard Deviation

Range = (Maximum Data Entry) – (Minimum Data Entry)

Range only uses two pieces of data

Variation and Standard Deviation use ALL entries of a data set

+ DeviationDeviation of an entry x in a POPULATION data set is the difference between the entry and the mean μ of the data set.

Deviation of x = x – μ(POPULATION)

Deviation of x = x – x (SAMPLE)

DISTANCE FROM MEAN!

+Calculate Deviations of Company A

37, 38, 39, 41, 41,41, 42, 44, 45, 47

Find the sum of the deviations.

+POPULATION VARIANCE

For POPULATION DATA

σ^2 = Σ (x- μ) ^2 / N

σ is the lowercase Greek letter Sigma

+Population Standard Deviation

Square Root of Variance (only σ)

Average distance away from the mean

Larger standard deviation means more spread out data.

+Sample Variance and Sample Standard Deviation.

When using sample data use x not μ

Divide by N-1 instead of N

+Calculate sample variation and standard deviation for Company B.

SET A: 37, 38, 39, 41, 41,41, 42, 44, 45, 47

SET B: 23, 29, 32, 40, 41, 41, 48, 50, 52, 59

+

+Assignment: Part 1 Section 2.4

Pg. 92 – 94 #1, 3, 13, 14, 19, 20

+How can we use standard deviation to make decisions about data?Standard deviation and variance tell us how spread out the data is

+Empirical Rule (68-95-99.7 Rule)In a BELL – SHAPED distribution,

1. ~68% of data is within 1 Standard Deviation of mean

2. ~95% of data is within 2 Standard Deviations of mean

3. ~99.7% of data is within 3 Standard Deviations of mean

+

+Example:

If 65 men’s heights have a bell shaped distribution with mean of 68 in and standard deviation of 2.5 inches, what percent of people are between 68 and 73 inches?

How many men is that?

+Chebychev’s TheoremIn ANY distribution, the percent of data

with k standard deviations (k >1) is AT LEAST 1 – (1/k^2)

For k = 2:

For k = 3:

+Example:A sample of 40 runners in a 1 mile race

gave a mean of 7 minutes with a standard deviation of 1.25 minutes. What can we say about how many people ran a mile in between 4.5 and 9.5 minutes?

+Assignment: Part 2 Section 2.4Pg. 95 – 97 #29 - #36 ONLY PART A

Pg. 88 has nice picture of Empirical Rule and Bell-Shaped Distributions

+

Section 2.5: Measure of Position

+FractilesNumbers that partition, or divide, an ORDERED data set into equal parts

Example: Median – Fractile that divides data set into two equal parts

+QuartilesThree Quartiles: Q1, Q2, and Q3

Divide an ordered data set into four equal parts

Q1 – First Quartile – one quarter of data fall on or below Q1

Q2 – Second Quartile – half of the data fall on or below Q2 Q2 is MEDIAN of the data set

Q3 – Third Quartile – ¾ of the data fall on or below Q3

+Interquartile Range

Difference between the third and first quartiles

IQR = Q3 – Q1

+Box-and-Whisker Plot

Five Number Summary:MaximumMinimumMedianQ1Q3

5, 7, 9, 10, 11, 13, 14, 15, 16, 17, 18, 18, 20 21, 37

What conclusion can we draw from graph?

+

+Assignment: Part 1 Section 2.5

Pg. 110 – 111 #17 - #20, #23, #26, #27, #28

+The Standard Score or Z-ScoreMeasures a data value’s position in the

data set

The STANDARD SCORE or Z-SCORE represents the number of standard deviations a given value x fall from the mean μ. To find the z-score for a given value, use the following formula:

Z = Value – Mean = x – μ

Standard Dev. σ

+Z-ScoreCan be POSITIVE, NEGATIVE, or ZERO

If z is NEGATIVE, then the corresponding x value is BELOW the mean.

If z is POSITIVE, then the corresponding x value is ABOVE the mean.

If z is ZERO, then the corresponding x value is the MEAN.

+Z-Score Example

Mean speed of vehicles is 56 MPH.

Standard Deviation of 4 MPH.

Car 1: 62 MPH

Car 2: 47 MPH

Car 3: 56 MPH

Calculate the z-score for Cars 1, 2, and 3.

Interpret this information.

+

+Z-Scores PLUS the Empirical RuleEmpirical Rule: 95% of data lies within 2

Standard Deviations Z-Score: 95% of data lies within -2 and 2. Usual scores

A z-score less than -2 or greater than 2 we would consider unusual.

A z-score less than -3 or greater than 3 we would consider VERY unusual.

REMEMBER – BELL-Shaped for Empirical Rule

+Assignment: Part 2 Section 2.5

Pg. 111 - 112 #29 - #34

+Section 2.3 Part 1(Mean, Median, Mode,)18. 6.2, 6, 520. 200.4, 186, none22. 61.2, 55, 80 and 12524. NP, NP, worse26. NP, NP, domestic28. 16.6, 15, none30. 314.1, 374, none32. 2.49, 2.35, 4.034. 213.4, 214, 217

Section 2.3 Part 241. 8942. 3632043. 612.7344. 982.1946. 8447. 6548. 69.752. Skewed Right53. Symmetric54. Uniform

Section 2.4 Part 11. R = 8, M = 7.9, V = 6.1, SD = 2.53. R = 12, M = 11.9, V = 17.1, SD = 4.119. LA: R = 17.6, V = 37.5, SD = 6.11 LB: R = 8.7, V = 8.71, SD = 2.9520. Dallas: R = 18.1, V = 37.33, SD = 6.11 Houston: R = 13, V = 12.26, SD = 3.5

Section 2.4 Part 229. 68%30. Between 1500 and 330031. a. 51, b. 1732. a. 38, b. 1933. 1000, 200034. 3325, 149035. 2436.Sentences involving 54.97 and 59.17

Section 2.5 Part 117. None18. SR19. SL20. S23. Q1 = 2, Q2 = 4, Q3 = 526. Q1 = 15.125, Q2 = 15.8, Q3 = 17.6527. a. 5, b. 50%, c. 25%28. a. 17.65, b. 50%, c. 50%

Section 2.5 Part 2

31. Stats: 1.43, Bio: 0.77. Did better on Stats32. Stats: -0.43, Bio: -0.77, Did better on Stats33. Stats: 2.14, Bio: 1.54, Did better on Stats34.Both 0, Both performed equally.

Recommended