116
1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS 1 – 5

1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

Embed Size (px)

DESCRIPTION

3 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Two Different Meanings of Statistics a) IN PLURAL SENSE, STATISTICS MEANS A SET OF OBSERVATIONS, USUALLY COLLECTED BY MEASUREMENTS OR COUNTING, COLLECTIVELY KNOWN AS DATA. (b) IN SINGULAR SENSE, STATISITICS REFERS TO A GROUP OF SCIENTIFIC METHODS USED TO (a) collecting data (b) interpreting and analyzing data (c) making conclusions or inferences.

Citation preview

Page 1: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

1Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Lecture Notes Number 1

INTRO STATS - 4TH EDITION

CHAPTERS 1 – 4

INTRO STATS – 3RD EDITION

CHAPTERS 1 – 5

Page 2: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

2Copyright © 2014, 2012, 2009 Pearson Education, Inc.

What Is Statistics?

Statistics is the science of

(1)collecting, (2)organizing,

(3)summarizing, and (4)analyzing

data to answer questions and/or draw conclusions

Page 3: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

3Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Two Different Meanings of Statistics

a) IN PLURAL SENSE, STATISTICS MEANS A SET OF OBSERVATIONS, USUALLY COLLECTED BY MEASUREMENTS OR COUNTING, COLLECTIVELY KNOWN AS DATA.

(b) IN SINGULAR SENSE, STATISITICS REFERS TO A GROUP OF SCIENTIFIC METHODS USED TO

(a) collecting data(b) interpreting and

analyzing data(c) making

conclusions or inferences.

Page 4: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

4Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Why Do We Care About Statistics?

Statistics allows us to:

Explore the world around us.Use evidence to check whether our beliefs are true.Find patterns to lead to discoveries.Share new discoveries with others.

Keep in MindStatistics must be used carefully.Inappropriate use will result in inaccurate beliefs. Results are always uncertain.

Page 5: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

5Copyright © 2014, 2012, 2009 Pearson Education, Inc.

THREE MAIN ASPECTS OF STATISTICS

DESIGN: PLANNING HOW TO OBTAIN DATA TO ANSWER THE QUESTIONS OF INTEREST (DATA COLLECTION)

DESCRIPTION: EXPLORING AND SUMMARIZING PATTERNS IN THE DATA (DATA ANALYSES)

INFERENCE: MAKING DECISIONS AND PREDICTIONS BASED ON THE DATA. TO INFER MEANS TO ARRIVE AT A DECISION OR PREDICTION BY REASONING FROM KNOWN EVIDENCE.

Page 6: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

6Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Types Of Statistics

DESCRIPTIVE STATISTICSDEFINED AS THOSE METHODS INVOLVING THE COLLECTION,

PRESENTATION, AND CHARACTERIZATION OF A SET OF DATA IN ORDER TO DESCRIBE PROPERLY THE VARIOUS FEATURES OF THAT SET OF DATA. TO ACHIEVE THESE, STATISTICIANS USE TABLES – EITHER FREQUENCY OR CONTIGENCY; BAR AND PIE CHARTS; STEM-AND-LEAF DISPLAYS; BOX-AND-WHISKER PLOTS; PARETO DIAGRAMS; HISTOGRAMS.

ALSO DEFINED AS THAT BRANCH OF STATISTICS THAT INVOLVES IN THE ORGANIZING, DISPLAYING, AND DESCRIBING OF DATA.

INFERENTIAL STATISTICS IS THE BRANCH OF STATISTICS THAT INVOLVES DRAWING CONCLUSIONS ABOUT A POPULATION BASED ON INFORMATION CONTAINED IN A SAMPLE FROM THAT POPULATION

Page 7: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

7Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Think, Show, and Tell

•Think about what information you want to know.

•Show your results by displaying the data in a professional and accurate manner.

•Tell your story by describing what can be concluded

from the data that was collected.

Page 8: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

8Copyright © 2014, 2012, 2009 Pearson Education, Inc.

SOME

RELEVANT

STATISTICAL

TERMINOLOGIES

Page 9: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

9Copyright © 2014, 2012, 2009 Pearson Education, Inc.

What Is Data?

Data• Any collection of numbers, characters,

images, or other items that provide information about something

• Data vary: Surveys and experiments produce a variety of outcomes.

Statistics helps us make sense of the data and how the data vary.

Page 10: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

10Copyright © 2014, 2012, 2009 Pearson Education, Inc.

What Is Data?

DATA IS SYSTEMATICALLY RECORDED INFORMATION, WHETHER NUMBERS OR LABELS, TOGETHER WITH ITS CONTEXT.

CONTEXT TELLS WHO, WHAT, WHEN, WHERE, HOW and WHY IS BEING MEASURED.

Page 11: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

11Copyright © 2014, 2012, 2009 Pearson Education, Inc.

The Six “W”s

•Who: Describe the individuals who were surveyed.•What: Determine what is being measured.•When: When was the research conducted?•Where: Where was the research conducted?•Why: What was the purpose of the survey or

experiment?•How: Describe how the survey or experiment was

conducted

Page 12: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

12Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Example

BECAUSE OF THE DIFFICULTY OF WEIGHING A BEAR IN THE WOODS, RESEARCHERS CAUGHT AND MEASURED 54 BEARS, RECORDING THEIR WEIGHT, NECK SIZE, LENGTH, AND SEX. THEY HOPED TO FIND A WAY TO ESTIMATE THE WEIGHT FROM THE OTHER, MORE EASILY DETERMINED QUANTITIES. IDENTIFY THE W’S

SOLUTION

Page 13: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

13Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Who and What

•Respondents: Individuals who answer the survey•Customers at Amazon

•Subjects or Participants: People who are experimented on•Patients who receive the new medication

•Experimental Units: The object of the experiment when it is not a person

•Rats that run through a maze

•Records: Rows in a database•Each person’s purchase record at Amazon

Page 14: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

14Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Population Versus Sample

Population•Collection of all data values that ever will occur for a group

•Often difficult to obtain all of this information

Sample•A subset of the population •Represents the population at large•Easier to obtain this information

Page 15: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

15Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Population Versus Sample

•The goal to collect a sample is to describe the population.

•The population is usually impractical or impossible to collect.

•A sample is used to make inferences about the population.

•The sample should be representative of the population.

Page 16: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

16Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Population Versus sample

ILLUSTRATION: POT OF CHICKEN SOUP

Page 17: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

17Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Examples: Identify the population and sample

i)A QUESTION POSTED ON THE LYCOS WEBSITE IN THE USA ON 18 JUNE 2000 ASKED VISITORS TO THE SITE TO SAY WHETHER THEY THOUGHT MARIJUANA SHOULD BE LEGALLY AVAILABLE FOR MEDICINAL PURPOSES.

(ii)THE GALLUP POLL INTERVIEWED 1007 RANDOMLY SELECTED U.S. ADULTS AGED 18 AND OLDER, MARCH 23 – 25, 2007. GALLUP REPORTS THAT WHEN ASKED IF EVER, THE EFFECTS OF GLOBAL WARMING WILL BEGIN TO HAPPEN, 60% OF THE RESPONDENTS SAID THE EFFECTS HAD ALREADY BEGUN. ONLY 11% THOUGHT THAT THEY WOULD NEVER HAPPEN.

Page 18: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

18Copyright © 2014, 2012, 2009 Pearson Education, Inc.

PARAMETER Versus STATISTIC

• PARAMETER or POPULATION PARAMETER: A PARAMETER IS A NUMERICAL SUMMARY OF THE POPULATION.

• STATISTIC or SAMPLE STATISTIC: A STATISTIC IS A NUMERICAL SUMMARY OF A SAMPLE TAKEN FROM A POPULATION.

• ILLUSTRATION:

Page 19: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

19Copyright © 2014, 2012, 2009 Pearson Education, Inc.

OUTLIERS

• OUTLIERS ARE UNUSUAL OR EXTREME VALUES THAT DO NOT APPEAR TO BELONG WITH THE REST OF THE DATA.

• SUCH STRAGGLERS STAND OFF AWAY FROM THE BODY OF THE DISTRIBUTION OF DATA SET.

• OUTLIERS CAN AFFECT MANY STATISTICAL ANALYSES, SO YOU SHOULD ALWAYS BE ALERT FOR THEM.

Page 20: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

20Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Variables

DEFINITION: THE CHARACTERISTICS RECORDED ABOUT EACH INDIVIDUAL ARE CALLED VARIABLES.

THERE ARE TWO TYPES OF VARIABLES – CATEGORICAL AND QUANTITATIVE VARIABLES.

Page 21: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

21Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Categorical Variables

•Categorical Variable: A variable that tells us what group or category an individual belongs to

•Synonyms: nominal and qualitative

•Examples: Favorite color, country of birth, area code

•Drawback of Categorical Variables: Challenging to analyze with computation

Page 22: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

22Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Quantitative Variables

•Quantitative Variable: Contains measured numerical values with measurement units

•Typically records the amount or degree of something

•Unit Examples: ounces, dollars, degrees Fahrenheit

Page 23: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

23Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Quantitative Variables

Discrete Quantitative Variable

A VARIABLE IS DISCRETE IF IT TAKES ITS VALUE FROM A COUNTABLE SET OF NUMBERS LIKE {0, 1, 2, 3, 4, …} OR FROM A FINITE SET LIKE {3, 4.5, 6, 11}

Continuous Quantitative Variable

A VARIABLE IS CONTINUOUS IF IT TAKES ITS POSSIBLE VALUES FROM AN INTERVAL OR A CONTINUUM LIKE [2,7], (-5,10), OR THE ENTIRE NUMBER LINE, R.

Page 24: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

24Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Categorical or Quantitative Variable?

• Amazon knows your age and will use it to present an age-appropriate image customized for you.

• Is Age categorical or quantitative?

• Perceived as Child, Teen, Young Adult, Middle Aged, Senior, age is categorical.

• Can also be perceived as quantitative if not categorized into a type

Page 25: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

25Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Identifiers

• Identifier Variable: A variable that is used to uniquely identify the individual. It does not describe the individual.•Login ID•Customer Number•Transaction Number•Social Security Number

Page 26: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

26Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Ordinal Variables

• Ordinal Variable: A variable that reports order without natural units

• Four-point Likert Scale: Strongly Disagree, Disagree, Agree, Strongly Agree

• Olympic Rank: Gold, Silver, Bronze• Can be treated as quantitative by using the rank number

• 1 = Strongly Disagree, 2 = Disagree, 3 = Agree, 4 = Strongly Agree

Page 27: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

27Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Quantitative And Categorical Data

• DATA COLLECTED FROM A QUANTITATIVE VARIABLE IS CALLED A QUANTITATIVE DATA. EXAMPLES INCLUDE HEIGHT, WEIGHT OF STUDENTS. TIME TO COMPLETE DIFFERENT TASKS.

• DATA COLLECTED FROM A CATEGORICAL VARIABLE IS CATEGORICAL DATA. EXAMPLE INCLUDE COLOR OF EYES.

Page 28: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

28Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Chapter 3

Displaying and Summarizing Quantitative Data

Page 29: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

29Copyright © 2014, 2012, 2009 Pearson Education, Inc.

3.1

Displaying Quantitative Variables

Page 30: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

30Copyright © 2014, 2012, 2009 Pearson Education, Inc.

HISTOGRAMS

• A HISTOGRAM IS A SUMMARY GRAPH SHOWING A COUNT OF THE DATA FALLING IN VARIOUS RANGES OR BINS OR CLASSES OR GROUPS.

• THE PURPOSE IS TO GRAPHICALLY SUMMARIZE AND DISPLAY THE DISTRIBUTION OF A PROCESSED DATA SET.

• IT IS PARTICULARLY USEFUL WHEN THERE ARE A LARGE NUMBER OF OBSERVATIONS.

• THE OBSERVATIONS OR DATA SETS FOR WHICH WE DRAW A HISTOGRAM ARE QUANTITATIVE VARIABLES.

Page 31: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

31Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Constructing Histograms Using Minitab Commands

• Open Minitab and ENTER the Data Set

• Click on Graph then on Histogram then on Simple then on OK

• Click on C1 then on Select

• Click on Labels then on Title (Write the title of the histogram)

• Click on OK

Page 32: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

32Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Histograms

•Histogram: A chart that displays quantitative data

• Great for seeing the distribution of the data

• Most earthquake generating tsunamis have magnitudes between 6.5 and 8.

• Japan and Sumatra quakes (9.0 and 9.1) are rare.

• Quakes under 5 rarely cause tsunamis.

• Quakes between 7.0 and 7.5 most common for causing tsunamis

A histogram of tsunami generating earthquakes

Page 33: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

33Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Choosing the Bin Width

•Different bin widths tell different stories.

•Choose the width that best shows the important features.

•Presentations can feature two histograms that present the same data in different ways.

•A gap in the histogram means that there were no occurrences in that range.

Page 34: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

34Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Relative Frequency Histograms

•Relative Frequency Histogram

•The vertical axis represents the relative frequency, the frequency divided by the total.

•The horizontal axis is the same as the horizontal axis for the frequency histogram.

•The shape of the relative frequency histogram is the same as the frequency histogram.

•Only the scale of the y-axis is different.

Page 35: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

35Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Stem-and-Leaf Displays

•Stem-and-Leaf: Shows both the shape of the distribution and all of the individual values

•Not as visually pleasing as a histogram; more technical looking

•Can only be used for small collections of data

•The first column (stems) represents the leftmost digit.

•The second column (leaves) shows the remaining digit(s).

Page 36: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

36Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Dotplots•Dotplot: Displays dots to describe

the shape of the distribution

•There were 30 races with a winning time of 122 seconds.

•Good for smaller data sets

•Visually more appealing than stem-and-leaf

•In StatCrunch: Graphics → Dotplot

Page 37: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

37Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Example

THE DATA BELOW GIVE THE NUMBER OF HURRICANES THAT HAPPENED EACH YEAR FROM 1944 THROUGH 2000 AS REPORTED BY SCIENCE MAGAZINE.

• 3,2,1,2,4,3,7,2,3,3,2,5,2,2,4,2,2,6,0,2,5,1,3,1,0,3,2,1,0,1,2,3,2,1,2,2,2,3,1,1,1,3,0,1,3,2,1,2,1,1,0,5,6,1,3,5,3

Page 38: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

38Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Frequency Table For Hurricane Data

No. of hurricanes Frequency or Count0 51 142 173 124 25 46 27 1

Page 39: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

39Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Dot Plot For Hurricane Data

76543210C6

Dot plot for hurrican data

Page 40: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

40Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Think Before you Draw

•Is the variable quantitative? Is the answer to the survey question or result of the experiment a number whose units are known?

•Histograms, stem-and-leaf diagrams, and dotplots can only display quantitative data.

•Bar and pie charts display categorical data.

Page 41: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

41Copyright © 2014, 2012, 2009 Pearson Education, Inc.

3.2

Shape

Page 42: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

42Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Describing The Distribution of a Quantitative Variable From Histograms

• WHEN YOU DESCRIBE THE DISTRIBUTION OF A QUANTITATIVE VARIABLE, YOU SHOULD ALWAYS TELL ABOUT FOUR THINGS NAMELY:

• THE SHAPE• THE CENTER• THE SPREAD• UNUSUAL FEATURES OR OUTLIERS

Page 43: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

43Copyright © 2014, 2012, 2009 Pearson Education, Inc.

THE SHAPE OF A DISTRIBUTION

• DOES THE HISTOGRAM HAVE A SINGLE, CENTRAL HUMP OR SEVERAL SEPERATED HUMPS? THESE HUMPS ARE CALLED MODES.

• A HISTOGRAM WITH ONE PEAK IS DUBBED UNIMODAL; HISTOGRAMS WITH TWO PEAKS ARE CALLED BIMODAL; AND THOSE WITH THREE OR MORE PEAKS ARE CALLED MULTIMODAL.

• A HISTOGRAM THAT DOESN’T APPEAR TO HAVE ANY MODE AND IN WHICH ALL THE BARS ARE APPROXIMATELY THE SAME HEIGHT IS CALLED UNIFORM.

Page 44: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

44Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Modes

•A Mode of a histogram is a hump or high-frequency bin.•One mode → Unimodal•Two modes → Bimodal•3 or more → Multimodal

Unimodal MultimodalBimodal

Page 45: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

45Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Uniform Distributions

•Uniform Distribution: All the bins have the same frequency, or at least close to the same frequency.

•The histogram for a uniform distribution will be flat.

Page 46: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

46Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Is The Histogram Symmetric or Skewed?

• CAN YOU FOLD THE HISTOGRAM ALONG A VERTICAL LINE THROUGH THE MIDDLE AND HAVE THE EDGES MATCH PRETTY CLOSELY, OR ARE MORE OF THE VALUES ON ONE SIDE?

• THE (USUALLY) THINNER ENDS OF A DISTRIBUTION ARE CALLED TAILS. IF ONE TAIL STRETCHES OUT FARTHER THAN THE OTHER, THE HISTOGRAM IS SAID TO BE SKEWED TO THE SIDE OF THE LONGER TAIL.

• A SKEWED RIGHT DISTRIBUTION IS ONE IN WHICH THE TAIL IS ON THE RIGHT SIDE.

• A SKEWED LEFT DISTRIBUTION IS ONE IN WHICH THE TAIL IS ON THE LEFT SIDE.

Page 47: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

47Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Symmetry

•The histogram for a symmetric distribution will look the same on the left and the right of its center.

Symmetric Not Symmetric Symmetric

Page 48: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

48Copyright © 2014, 2012, 2009 Pearson Education, Inc.

SYMMETRIC HISTOGRAM

Page 49: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

49Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Skew•A histogram is skewed right if the longer tail is on the

right side of the mode.

•A histogram is skewed left if the longer tail is on the left side of the mode.

Skewed LeftSkewed Right

Page 50: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

50Copyright © 2014, 2012, 2009 Pearson Education, Inc.

LEFT – SKEWED HISTOGRAM

Page 51: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

51Copyright © 2014, 2012, 2009 Pearson Education, Inc.

RIGHT – SKEWED HISTOGRAM

Page 52: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

52Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Outliers•An Outlier is a data value that is far above or far below

the rest of the data values.

•An outlier is sometimes just an error in the data collection.

•An outlier can also be the most important data value.

•Income of a CEO

•Temperature of a person with a high fever

•Elevation at Death Valley

Page 53: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

53Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Example•The histogram shows the amount

of money spent by a credit card company’s customers. Describe and interpret the distribution.

•The distribution is unimodal. Customers most commonly spent a small amount of money.

•The distribution is skewed right. Many customers spent only a small amount and a few were spread out at the high end.

•There is an outlier at around $7000. One customer spent much more than the rest of the customers.

Page 54: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

54Copyright © 2014, 2012, 2009 Pearson Education, Inc.

3.3

Center

Page 55: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

55Copyright © 2014, 2012, 2009 Pearson Education, Inc.

THE CENTER OF THE DISTRIBUTION

• THE CENTER IS A VALUE THAT ATTEMPTS THE IMPOSSIBLE BY SUMMARIZING THE ENTIRE DISTRIBUTION WITH A SINGLE NUMBER, A “TYPICAL” VALUE.

• MEASURES OF CENTER INCLUDE THE MEAN AND THE MEDIAN.

• WHEN A HISTOGRAM IS UNIMODAL AND SYMMETRIC, WE WOULD AGREE ON THE CENTER OF SYMMETRY, WHERE WE WOULD FOLD THE HISTOGRAM TO MATCH THE TWO SIDES.

• WHEN THE DISTRIBUTION IS SKEWED OR POSSIBLY MULTIMODAL, DEFINING THE CENTER IS MORE OF A CHALLENGE.

Page 56: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

56Copyright © 2014, 2012, 2009 Pearson Education, Inc.

The Median•Median: The center of the

data values

•Half of the data values are to the left of the median and half are to the right of the median.

•For symmetric distributions, the median is directly in the middle.

Page 57: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

57Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Calculating the Median: Odd Sample Size•First order the numbers.

•If there are an odd number of numbers, n, the median is at position .

•Find the median of the numbers: 2, 4, 5, 6, 7, 9, 9.

•The median is the fourth number: 6

•Note that there are 3 numbers to the left of 6 and 3 to the right.

12

n

1 7 1 42 2

n

Page 58: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

58Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Calculating the Median: Even Sample Size•First order the numbers.

•If there are an even number of numbers, n, the median is the average of the two middle numbers: .

•Find the median of the numbers: 2, 2, 4, 6, 7, 8.

•The median is the average of the third and the fourth numbers:

6 3

2 2n

, 12 2n n

4 6Median 52

Page 59: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

59Copyright © 2014, 2012, 2009 Pearson Education, Inc.

3.4

Spread

Page 60: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

60Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Spread

•Locating the center is only part of the story

•Are the data all near the center or are they spread out?

•Is the highest value much higher than the lowest value?

•To describe data, we must discuss both the center and the spread.

Page 61: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

61Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Measures of Spread of Quantitative Data

• A MEASURE OF SPREAD IS A NUMERICAL SUMMARY OF HOW TIGHTLY THE VALUES ARE CLUSTERED AROUND THE CENTER.

• MEASURES OF SPREAD INCLUDE:- RANGE- INTERQUARTILE RANGE (IQR)- STANDARD DEVIATION

Page 62: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

62Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Range

•The range is the difference between the maximum and minimum values.

Range = Maximum – Minimum

•The ages of the guests at your dinner party are: 16, 18, 23, 23, 27, 35, 74

•The range is: 74 – 16 = 58

•The range is sensitive to outliers. A single high or low value will affect the range significantly.

Page 63: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

63Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Percentiles and Quartiles

•Percentiles divide the data in one hundred groups.

•The nth percentile is the data value such that n percent of the data lies below that value.

•For large data sets, the median is the 50th percentile.

•The median of the lower half of the data is the 25th percentile and is called the first quartile (Q1).

•The median of the upper half of the data is the 75th percentile and is called the third quartile (Q3).

Page 64: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

64Copyright © 2014, 2012, 2009 Pearson Education, Inc.

The Interquartile Range

•The Interquartile Range (IQR) is the difference between

the upper quartile and the lower quartile IQR = Q3 – Q1

•The IQR measures the range of the middle half of the data.

•Example: If Q1 = 23 and Q3 = 44 then

IQR = 44 – 23 = 21

Page 65: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

65Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Example

(1) Find Q1, Q3, and IQR for the dataset 7, 3, 5, 1, 9 (n = odd)

(2) Find Q1, Q3, and IQR for the dataset 7, 3, 5, 1, 9, 11 ( n = even)

Page 66: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

66Copyright © 2014, 2012, 2009 Pearson Education, Inc.

The Interquartile Range

•The Interquartile Range for earthquake causing tsunamis is 0.9.

•The picture below shows the meaning of the IQR.

Page 67: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

67Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Benefits and Drawbacks of the IQR

•The Interquartile Range is not sensitive to outliers.

•The IQR provides a reasonable summary of the spread of the distribution.

•The IQR shows where typical values are, except for the case of a bimodal distribution.

•The IQR is not great for a general audience since most people do not know what it is.

Page 68: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

68Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Extra credit

Find the median, Q1, Q3, and IQR of the following dataset:

(a)45, 46, 49, 35, 76, 80, 89, 94, 37, 61, 62, 64, 68, 56, 57, 57, 59, 71, 72.

(b) 850, 900,1400,1200,1050, 1000, 750, 1250, 1050, 565

Page 69: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

69Copyright © 2014, 2012, 2009 Pearson Education, Inc.

3.5

Boxplots and 5-Number Summaries

Page 70: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

70Copyright © 2014, 2012, 2009 Pearson Education, Inc.

5-Number Summary

•The 5-Number Summary provides a numerical description of the data. It consists of

•Minimum•First Quartile (Q1)•Median•Third Quartile (Q3)•Maximum

•The list to the right shows the 5-Number Summary for the tsunami data.

Page 71: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

71Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Interpreting the 5-Number Summary

•The smallest tsunami-causing earthquake had magnitude 3.7.

•The largest tsunami-causing earthquake had magnitude 9.1.

•The middle half of tsunami-causing earthquakes is between 6.7 and 7.6.

•Half of tsunami-causing earthquakes have magnitudes below 7.2 and half are above 7.2.

•A tsunami-causing earthquake less than 6.7 is small.

•A tsunami-causing earthquake more than 7.6 is small.

Page 72: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

72Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Boxplots

•A Boxplot is a chart that displays the 5-Point Summary and the outliers.

•The Box shows the Interquartile Range.

•The dashed lines are called fences, outside the fences lie the outliers.

•Above and below the box are the whiskers that display the most extreme data values within the fences.

•The line inside the box shows the median.

Page 73: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

73Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Finding the Fences

•The lower fence is defined by Lower Fence = Q1 – 1.5 × IQR

•The upper fence is defined by Upper Fence = Q3 + 1.5 × IQR

•Tsunami Example: Q1 = 6.7, Q3 = 7.6 IQR = 7.6 – 6.7 = 0.9

•Lower Fence = 6.7 – 1.5 × 0.9 = 5.35

•Upper Fence = 7.6 + 1.5 × 0.9 = 8.95

Page 74: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

74Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Self – read example (Slides 75 – 78)

Page 75: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

75Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Step-by-Step Example of Shape, Center, Spread: Flight Cancellations

•Question: How often are flights cancelled?

•Who? Months

•What? Percentage of Flights Cancelled at U.S. Airports

•When? 1995 – 2011

•Where? United States

•How? Bureau of Transportation Statistics Data

Page 76: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

76Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Flight Cancellations: Think

•Identify the Variable•Percent of flight cancellations at U.S. airports•Quantitative: Units are percentages.

•How will be data be summarized?•Histogram•Numerical Summary•Boxplot

Page 77: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

77Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Flight Cancellations: Show

•Use StatCrunch to create the histogram, boxplot, and numerical summary.

Page 78: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

78Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Flight Cancellations: Tell•Describe the shape, center, and spread of the

distribution. Report on the symmetry, number of modes, and any gaps or outliers. You should also mention any concerns you may have about the data.

•Skewed to the Right: Can’t be a negative percent. Bad weather and other airport troubles can cause extreme cancellations.

•IQR is small: 1.23%. Consistency among cancellation percents

•Extraordinary outlier at 20.2%: September 2001

Page 79: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

79Copyright © 2014, 2012, 2009 Pearson Education, Inc.

3.6

The Center of Symmetric Distributions: The Mean

Page 80: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

80Copyright © 2014, 2012, 2009 Pearson Education, Inc.

The Mean

•The Mean is what most people think of as the average.

•Add up all the numbers and divide by the number of numbers.

•Recall that means “Add them all.”

•In StatCrunch, the mean is listed in the Summary Statistics.

yy

n

Page 81: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

81Copyright © 2014, 2012, 2009 Pearson Education, Inc.

The Mean is the “Balancing Point”

•If you put your finger on the mean, the histogram will balance perfectly.

Page 82: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

82Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Mean Vs. Median

•For symmetric distributions, the mean and the median are equal.

•The balancing point is at the center.

•The tail “pulls” the mean towards it more than it does to

the median.

•The mean is more sensitive to outliers than the median.

Page 83: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

83Copyright © 2014, 2012, 2009 Pearson Education, Inc.

The Mean Is Attracted to the Outlier

•The mean is larger than the median since it is “pulled” to the right by the outlier.

•The median is a better measure of the center for data that is skewed.

Page 84: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

84Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Why Use the Mean?

•Although the median is a better measure of the center, the mean weighs in large and small values better.

•The mean is easier to work with.

•For symmetric data, statisticians would rather use the mean.

•It is always ok to report both the mean and the median.

Page 85: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

85Copyright © 2014, 2012, 2009 Pearson Education, Inc.

3.7

The Spread of Symmetric Distributions: The Standard Deviation

Page 86: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

86Copyright © 2014, 2012, 2009 Pearson Education, Inc.

The Variance

•The variance is a measure of how far the data is spread out from the mean.

•The difference from the mean is: .

•To make it positive, square it.

•Then find the average of all of these distances, except instead of dividing by n, divide by n – 1.

•Use s2 to represent the variance.

•The variance will mostly be used to find the standard deviation s which is the square root of the variance.

y y

22

1y y

sn

Page 87: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

87Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Standard Deviation

•The variance’s units are the square of the original units.

•Taking the square root of the variance gives the standard deviation, which will have the same units as y.

•The standard deviation is a number that is close to the average distances that the y values are from the mean.

•If data values are close to the mean (less spread out), then the standard deviation will be small.

•If data values are far from the mean (more spread out), then the standard deviation will be large.

2

1y y

sn

Page 88: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

88Copyright © 2014, 2012, 2009 Pearson Education, Inc.

The Standard Deviation and Histograms

A B C

Answer: C, A, B

Order the histograms below from smallest standard deviation to largest standard deviation.

Page 89: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

89Copyright © 2014, 2012, 2009 Pearson Education, Inc.

3.8

Summary—What to Tell About a Quantitative Variable

Page 90: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

90Copyright © 2014, 2012, 2009 Pearson Education, Inc.

What to Tell•Histogram, Stem-and-Leaf, Boxplot

•Describe modality, symmetry, outliers

•Center and Spread•Median and IQR if not symmetric•Mean and Standard Deviation if symmetric.•Unimodal symmetric data: IQR > s. Check for errors.

•Unusual Features•For multiple modes, possibly split the data into groups.•When there are outliers, report the mean and standard

deviation with and without the outliers.

Page 91: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

91Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Self – read example (Slides 92 – 93)

Page 92: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

92Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Example: Fuel Efficiency

•The car owner has checked the fuel efficiency each time he filled the tank. How would you describe the fuel efficiency?

•Plan: Summarize the distribution of the car’s fuel efficiency.

•Variable: mpg for 100 fill ups, Quantitative

•Mechanics: show a histogram•Fairly symmetric•Low outlier

Page 93: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

93Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Fuel Efficiency Continued

•Which to report?•The mean and median are close.•Report the mean and standard deviation.

•Conclusion•Distribution is unimodal and symmetric.•Mean is 22.4 mpg.•Low outlier may be investigated, but limited effect on

the mean•s = 2.45; from one filling to the next, fuel efficiency

differs from the mean by an average of about 2.45 mpg.

Page 94: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

94Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Self – read(Slides 95 – 97)

Page 95: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

95Copyright © 2014, 2012, 2009 Pearson Education, Inc.

What Can Go Wrong?•Don’t make a histogram for categorical data.

•Don’t look for shape, center, and spread for a bar chart.

•Choose a bin width appropriate for the data.

Page 96: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

96Copyright © 2014, 2012, 2009 Pearson Education, Inc.

What Can Go Wrong? Continued•Do a reality check

•Don’t blindly trust your calculator. For example, a mean student age of 193 years old is nonsense.

•Sort before finding the median and percentiles.•315, 8, 2, 49, 97 does not have median of 2.

•Don’t worry about small differences in the quartile calculation.

•Don’t compute numerical summaries for a categorical variable.

•The mean Social Security number is meaningless.

Page 97: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

97Copyright © 2014, 2012, 2009 Pearson Education, Inc.

What Can Go Wrong? Continued

•Don’t report too many decimal places.•Citing the mean fuel efficiency as 22.417822453 is

going overboard.

•Don’t round in the middle of a calculation.

•For multiple modes, think about separating groups.•Heights of people → Separate men and women

•Beware of outliers, the mean and standard deviation are sensitive to outliers.

•Use a histogram or dotplot to ensure that the mean and standard deviation really do describe the data.

Page 98: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

98Copyright © 2014, 2012, 2009 Pearson Education, Inc.

PARTIAL STUDY MATERIAL FOR TAKE HOME EXAMINATION

CHAPTER 2

Displaying and Describing Categorical Data

Page 99: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

99Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Frequency Tables

•A frequency table is a table whose first column displays each distinct outcome and second column displays that outcome’s frequency.

•If there are many distinct outcomes, then combining them into a few categories is recommended

Page 100: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

100Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Relative Frequency Tables

•A relative frequency table is a table whose first column displays each distinct outcome and second column displays that outcome’s relative frequency.

•The relative frequency table is similar to the frequency table, but it displays relative frequencies rather than frequencies.

Page 101: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

101Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Bar Charts

•A bar chart displays the frequency or relative frequency of each category.

•All bars must have the same width.

•Good for general audience

Page 102: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

102Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Pie Charts

•A pie chart presents each category as a slice of a circle so that each slice has a size that is proportion to the whole in each category.

•Pie charts are also good for a general audience.

•Pie charts help to display the fraction of the whole that each category represents

Page 103: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

103Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Think Before You Draw

•Choose the chart that best tells the story of your data.

•Think about the intended audience to select a chart that is best for them.

•Charts often work better when the categories do not overlap.

•Don’t try to fool your audience, just give a chart that honestly expresses the interesting features of the data.

Page 104: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

104Copyright © 2014, 2012, 2009 Pearson Education, Inc.

2.2

Exploring the Relationship Between

Two Categorical Variables

Page 105: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

105Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Contingency Tables

•A contingency table is a table that displays two categorical variables and their relationships.

•There were 528 third-class ticket holders who died.

•The bottom row represents the totals for class and is called the marginal distribution.

•The right column represents the totals for survival and is also a marginal distribution.

Page 106: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

106Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Table of Percents

•A table of percents can be misleading.

•Looking at “Alive”, was it better to have a second- or third-class ticket?

• 8.1% were third-class survivors, 5.4% were second- class survivors.

• What is wrong with just comparing these percentages?

Page 107: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

107Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Conditional Distributions

•A conditional distribution provides the percent of one variable satisfying the conditions of another.

•25.2% of all third-class ticket holders survived.

•Was it better to have a second- or third-class ticket?

Page 108: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

108Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Conditional Distribution: Rows or Columns

•The “Condition” can either be based on rows or columns.

•This table shows that the highest percent of survivors were crew members.

•The highest percent of the dead were also crew members.

Page 109: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

109Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Conditional Distributions as Pie Charts

•Pie charts can give a visual representation of the conditional distributions.

•Compare how the first- class ticket holders were represented amongst the survivors vs. the dead.

Page 110: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

110Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Bar Charts

•Bar charts can also effectively tell the story for conditional distributions.

•Which is best: Table, Pie chart, or Bar Graph?

Page 111: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

111Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Independence

•Independence: The distribution of one variable is the same for all categories of another.

•For dependent variables, there is an association between the two variables

Page 112: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

112Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Independence Example

•Is there an association between gender and interest in Super Bowl TV Coverage?

•Large difference for men between watching the game and commercials

•Smaller difference for women

•There is an association between gender and interest.

Page 113: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

113Copyright © 2014, 2012, 2009 Pearson Education, Inc.

What’s Wrong With These Charts?

Violates the area PrincipleAdd up the percents

Page 114: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

114Copyright © 2014, 2012, 2009 Pearson Education, Inc.

More Words of Caution

•Don’t confuse percents of the whole with marginal percents.

•Don’t leave out marginal percents.

•Don’t make conclusions based on only a handful of individuals.

•Don’t make independence conclusions where there is only a small difference.

Page 115: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

115Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Simpson’s Paradox

•Which pilot had a better on-time flight record?

•Moe was better overall.

•Jill was better for both day and night flights.

•Simpson’s Paradox: One is higher overall while the other is higher in every category. Number of On-Time Flights

Page 116: 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Lecture Notes Number 1 INTRO STATS - 4 TH EDITION CHAPTERS 1 – 4 INTRO STATS – 3 RD EDITION CHAPTERS

116Copyright © 2014, 2012, 2009 Pearson Education, Inc.

Learning Objectives

•Summarize categorical data by counting cases and expressing the results as percents.

•Create and interpret bar charts, pie charts and contingency tables.

•Interpret marginal and conditional distributions.

•Make conclusions about independence and associations from analyzing conditional distributions.