30
Graphs and Describing Distributions Section 1.5 Cathy Poliak, Ph.D. [email protected] Department of Mathematics University of Houston January 26, 2016 Cathy Poliak, Ph.D. [email protected] (Department of Mathematics University of Houston ) Section 1.5 January 26, 2016 1 / 30

Graphs and Describing Distributionscathy/Math2311/Lectures/chapter... · 2016. 1. 18. · Air Max CB34 Nike 110 Air Max Dominate Nike 75 Cathy Poliak, Ph.D. [email protected] (Department

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Graphs and Describing Distributionscathy/Math2311/Lectures/chapter... · 2016. 1. 18. · Air Max CB34 Nike 110 Air Max Dominate Nike 75 Cathy Poliak, Ph.D. cathy@math.uh.edu (Department

Graphs and Describing DistributionsSection 1.5

Cathy Poliak, [email protected]

Department of MathematicsUniversity of Houston

January 26, 2016

Cathy Poliak, Ph.D. [email protected] (Department of Mathematics University of Houston )Section 1.5 January 26, 2016 1 / 30

Page 2: Graphs and Describing Distributionscathy/Math2311/Lectures/chapter... · 2016. 1. 18. · Air Max CB34 Nike 110 Air Max Dominate Nike 75 Cathy Poliak, Ph.D. cathy@math.uh.edu (Department

Outline

1 Graphs for Categorical Variables

2 Graphs for Quantitative Variables

3 Describing Distributions

Cathy Poliak, Ph.D. [email protected] (Department of Mathematics University of Houston )Section 1.5 January 26, 2016 2 / 30

Page 3: Graphs and Describing Distributionscathy/Math2311/Lectures/chapter... · 2016. 1. 18. · Air Max CB34 Nike 110 Air Max Dominate Nike 75 Cathy Poliak, Ph.D. cathy@math.uh.edu (Department

Data of basketball shoes

We will be demonstrating these graphs with this dataset called shoes.

Name Brand PriceadiPower Howard 2 Adidas 75adiZero Crazy Light Adidas 90

adiZero Crazy Light 2 Adidas 140...

1 Flight Nike Jordan 1001 Flight Low Nike Jordan 95

...Air Max CB34 Nike 110

Air Max Dominate Nike 75...

Cathy Poliak, Ph.D. [email protected] (Department of Mathematics University of Houston )Section 1.5 January 26, 2016 3 / 30

Page 4: Graphs and Describing Distributionscathy/Math2311/Lectures/chapter... · 2016. 1. 18. · Air Max CB34 Nike 110 Air Max Dominate Nike 75 Cathy Poliak, Ph.D. cathy@math.uh.edu (Department

Graphs of a Categorical Variable

Bar graphs: Each individual bar represents a category and theheight of each of the bars are either represented by the count orpercent.Pie charts: Helps us see what part of the whole each groupforms.

Cathy Poliak, Ph.D. [email protected] (Department of Mathematics University of Houston )Section 1.5 January 26, 2016 4 / 30

Page 5: Graphs and Describing Distributionscathy/Math2311/Lectures/chapter... · 2016. 1. 18. · Air Max CB34 Nike 110 Air Max Dominate Nike 75 Cathy Poliak, Ph.D. cathy@math.uh.edu (Department

Bar graph

Adidas Ekito Nike Reebok

010

2030

40

plot(shoes$Brand)Cathy Poliak, Ph.D. [email protected] (Department of Mathematics University of Houston )Section 1.5 January 26, 2016 5 / 30

Page 6: Graphs and Describing Distributionscathy/Math2311/Lectures/chapter... · 2016. 1. 18. · Air Max CB34 Nike 110 Air Max Dominate Nike 75 Cathy Poliak, Ph.D. cathy@math.uh.edu (Department

Pie Chart

Adidas

EkitoNike

Nike Jordan

ReebokUnder Armor

Cathy Poliak, Ph.D. [email protected] (Department of Mathematics University of Houston )Section 1.5 January 26, 2016 6 / 30

Page 7: Graphs and Describing Distributionscathy/Math2311/Lectures/chapter... · 2016. 1. 18. · Air Max CB34 Nike 110 Air Max Dominate Nike 75 Cathy Poliak, Ph.D. cathy@math.uh.edu (Department

R code

For bar graph: plot(datasetname$variablename)

For pie chart:> counts<-table(shoes$Brand)> pie(counts)

Cathy Poliak, Ph.D. [email protected] (Department of Mathematics University of Houston )Section 1.5 January 26, 2016 7 / 30

Page 8: Graphs and Describing Distributionscathy/Math2311/Lectures/chapter... · 2016. 1. 18. · Air Max CB34 Nike 110 Air Max Dominate Nike 75 Cathy Poliak, Ph.D. cathy@math.uh.edu (Department

Graphs for quantitative variables

Dotplot

Stemplot

Histogram

Boxplot

Cathy Poliak, Ph.D. [email protected] (Department of Mathematics University of Houston )Section 1.5 January 26, 2016 8 / 30

Page 9: Graphs and Describing Distributionscathy/Math2311/Lectures/chapter... · 2016. 1. 18. · Air Max CB34 Nike 110 Air Max Dominate Nike 75 Cathy Poliak, Ph.D. cathy@math.uh.edu (Department

Dot plots

A dot plot is made by putting dots above the values listed on anumber line.

0 50 100 150 200 250 300

Price of Basketball Shoes

Cathy Poliak, Ph.D. [email protected] (Department of Mathematics University of Houston )Section 1.5 January 26, 2016 9 / 30

Page 10: Graphs and Describing Distributionscathy/Math2311/Lectures/chapter... · 2016. 1. 18. · Air Max CB34 Nike 110 Air Max Dominate Nike 75 Cathy Poliak, Ph.D. cathy@math.uh.edu (Department

Stem - and - leaf plot

1. Separate each observation into a stem consisting of all but thefinal rightmost digit and a leaf, the final digit. Stems may have asmany digits as needed, but each leaf contains only a single digit.

2. Write the stems in a vertical column with the smallest at the top,and draw a vertical line at the right of this column.

3. Write each leaf in the row to the right of its stem, in increasingorder out from the stem.

Rcode: stem(dataset name$variable name)

Cathy Poliak, Ph.D. [email protected] (Department of Mathematics University of Houston )Section 1.5 January 26, 2016 10 / 30

Page 11: Graphs and Describing Distributionscathy/Math2311/Lectures/chapter... · 2016. 1. 18. · Air Max CB34 Nike 110 Air Max Dominate Nike 75 Cathy Poliak, Ph.D. cathy@math.uh.edu (Department

Example of Stem-and-leaf Plot

> stem(shoes$Price)

The decimal point is 1 digit(s) to the right of the |

4 | 0500006 | 00000000000000055558 | 0000055550000000000000555

10 | 0000050000000000512 | 000000555014 | 000000016 | 000018 |20 |22 | 524 | 00

Cathy Poliak, Ph.D. [email protected] (Department of Mathematics University of Houston )Section 1.5 January 26, 2016 11 / 30

Page 12: Graphs and Describing Distributionscathy/Math2311/Lectures/chapter... · 2016. 1. 18. · Air Max CB34 Nike 110 Air Max Dominate Nike 75 Cathy Poliak, Ph.D. cathy@math.uh.edu (Department

Histograms

Bar graph for quantitative variables.Values of the variable are grouped together.Bars are touching.The width of the bar represents an interval of values (range ofnumbers) for that variable.The height of the bar represents the number of cases within thatrange of values.

Cathy Poliak, Ph.D. [email protected] (Department of Mathematics University of Houston )Section 1.5 January 26, 2016 12 / 30

Page 13: Graphs and Describing Distributionscathy/Math2311/Lectures/chapter... · 2016. 1. 18. · Air Max CB34 Nike 110 Air Max Dominate Nike 75 Cathy Poliak, Ph.D. cathy@math.uh.edu (Department

To create a histogram

1. Divide the range of data into classes of equal width. For examplethe price of the basketballs shoes are from $40 to $250 dollars.We can use a width of $20 for the classes. Thus the classes are:

40 ≤ price < 6060 ≤ price < 80

...240 ≤ price < 260

Be sure to specify the classes precisely so that each individualprice falls into exactly one class and all of the prices are counted.

2. Count the number of shoe prices in each class.

Cathy Poliak, Ph.D. [email protected] (Department of Mathematics University of Houston )Section 1.5 January 26, 2016 13 / 30

Page 14: Graphs and Describing Distributionscathy/Math2311/Lectures/chapter... · 2016. 1. 18. · Air Max CB34 Nike 110 Air Max Dominate Nike 75 Cathy Poliak, Ph.D. cathy@math.uh.edu (Department

Counts of the classes

Price (Classes) Count40 ≤ price < 60 660 ≤ price < 80 1980 ≤ price < 100 25100 ≤ price < 120 17120 ≤ price < 140 10140 ≤ price < 160 7160 ≤ price < 180 4180 ≤ price < 200 0200 ≤ price < 220 0220 ≤ price < 240 1240 ≤ price < 260 2

Cathy Poliak, Ph.D. [email protected] (Department of Mathematics University of Houston )Section 1.5 January 26, 2016 14 / 30

Page 15: Graphs and Describing Distributionscathy/Math2311/Lectures/chapter... · 2016. 1. 18. · Air Max CB34 Nike 110 Air Max Dominate Nike 75 Cathy Poliak, Ph.D. cathy@math.uh.edu (Department

Draw the histogram

1. Mark on the horizontal axis the scale for the variable whosedistribution you are displaying.

2. The vertical axis contains the scale of the counts.3. Each bar represents a class. The base of the bar covers the width

of the classes, and the bar height is the class count. There is nohorizontal space between bars unless a class is empty, so that itsbar has height zero.

Rcode: hist(dataset name$variable name)

Cathy Poliak, Ph.D. [email protected] (Department of Mathematics University of Houston )Section 1.5 January 26, 2016 15 / 30

Page 16: Graphs and Describing Distributionscathy/Math2311/Lectures/chapter... · 2016. 1. 18. · Air Max CB34 Nike 110 Air Max Dominate Nike 75 Cathy Poliak, Ph.D. cathy@math.uh.edu (Department

Histogram

Histogram of Price

Price

Fre

quen

cy

50 100 150 200 250

05

1015

2025

hist(shoes$Price, main = "Histogram of Price", xlab = "Price")

Cathy Poliak, Ph.D. [email protected] (Department of Mathematics University of Houston )Section 1.5 January 26, 2016 16 / 30

Page 17: Graphs and Describing Distributionscathy/Math2311/Lectures/chapter... · 2016. 1. 18. · Air Max CB34 Nike 110 Air Max Dominate Nike 75 Cathy Poliak, Ph.D. cathy@math.uh.edu (Department

Boxplot

A graph of the five-number summary.I A central box spans the quartiles.I A line inside the box marks the median.I Lines extend from the box out to the smallest and largest

observations.I Asterisks represents any values that are considered to be outliers.

Boxplots are most useful for side-by-side comparison of severaldistributions.Rcode: boxplot(dataset name$variable name)

Cathy Poliak, Ph.D. [email protected] (Department of Mathematics University of Houston )Section 1.5 January 26, 2016 17 / 30

Page 18: Graphs and Describing Distributionscathy/Math2311/Lectures/chapter... · 2016. 1. 18. · Air Max CB34 Nike 110 Air Max Dominate Nike 75 Cathy Poliak, Ph.D. cathy@math.uh.edu (Department

Boxplot of Prices

50 100 150 200 250

boxplot(shoes$Price,horizontal = T)Cathy Poliak, Ph.D. [email protected] (Department of Mathematics University of Houston )Section 1.5 January 26, 2016 18 / 30

Page 19: Graphs and Describing Distributionscathy/Math2311/Lectures/chapter... · 2016. 1. 18. · Air Max CB34 Nike 110 Air Max Dominate Nike 75 Cathy Poliak, Ph.D. cathy@math.uh.edu (Department

Boxplot of Prices by Brand

Adidas Ekito Nike Reebok

5010

015

020

025

0

boxplot(shoes$Price~shoes$Brand)Cathy Poliak, Ph.D. [email protected] (Department of Mathematics University of Houston )Section 1.5 January 26, 2016 19 / 30

Page 20: Graphs and Describing Distributionscathy/Math2311/Lectures/chapter... · 2016. 1. 18. · Air Max CB34 Nike 110 Air Max Dominate Nike 75 Cathy Poliak, Ph.D. cathy@math.uh.edu (Department

Cumulative Frequency Polygon

Plot a point above each upper class boundary at a height equal tothe cumulative frequency of the class.Connect the plotted points with line segments.A similar graph can be used with the cumulative percents.

Cathy Poliak, Ph.D. [email protected] (Department of Mathematics University of Houston )Section 1.5 January 26, 2016 20 / 30

Page 21: Graphs and Describing Distributionscathy/Math2311/Lectures/chapter... · 2016. 1. 18. · Air Max CB34 Nike 110 Air Max Dominate Nike 75 Cathy Poliak, Ph.D. cathy@math.uh.edu (Department

Cumulative Percent Polygon

50 100 150 200 250

020

4060

8010

0

Cumulative Percent Polygon

Prices

Cum

ulat

ive

Per

cent

Cathy Poliak, Ph.D. [email protected] (Department of Mathematics University of Houston )Section 1.5 January 26, 2016 21 / 30

Page 22: Graphs and Describing Distributionscathy/Math2311/Lectures/chapter... · 2016. 1. 18. · Air Max CB34 Nike 110 Air Max Dominate Nike 75 Cathy Poliak, Ph.D. cathy@math.uh.edu (Department

Question about the Graphs

Given the first type of plot indicated in each pair, which of the secondplots could not always be generated from it?a) dot plot, histogramb) stem and leaf, dot plotc) histogram, stem and leafd) dot plot, box plot

Cathy Poliak, Ph.D. [email protected] (Department of Mathematics University of Houston )Section 1.5 January 26, 2016 22 / 30

Page 23: Graphs and Describing Distributionscathy/Math2311/Lectures/chapter... · 2016. 1. 18. · Air Max CB34 Nike 110 Air Max Dominate Nike 75 Cathy Poliak, Ph.D. cathy@math.uh.edu (Department

Distributions

When observing a data set, one of the first things we want to knowis how each variable is distributed.The distribution of a variable tells us what values it takes andhow often it takes these values based on the individuals.The distribution of a variable can be shown through tables,graphs, and numerical summaries.

Cathy Poliak, Ph.D. [email protected] (Department of Mathematics University of Houston )Section 1.5 January 26, 2016 23 / 30

Page 24: Graphs and Describing Distributionscathy/Math2311/Lectures/chapter... · 2016. 1. 18. · Air Max CB34 Nike 110 Air Max Dominate Nike 75 Cathy Poliak, Ph.D. cathy@math.uh.edu (Department

Distributions for categorical variables

Lists the categories and gives either the count or the percent ofcases that fall in each category.One way is a frequency table that displays the differentcategories then the count or percent of cases that fall in eachcategory.Then we look at the graphs (bar or pie) to determine thedistribution of a categorical variable.

Cathy Poliak, Ph.D. [email protected] (Department of Mathematics University of Houston )Section 1.5 January 26, 2016 24 / 30

Page 25: Graphs and Describing Distributionscathy/Math2311/Lectures/chapter... · 2016. 1. 18. · Air Max CB34 Nike 110 Air Max Dominate Nike 75 Cathy Poliak, Ph.D. cathy@math.uh.edu (Department

Describing distributions of quantitative variables

The distribution of a variable tells us what values it takes andhow often it takes these values.

There are four main characteristics to describe a distribution:1. Shape2. Center3. Spread4. Outliers

Cathy Poliak, Ph.D. [email protected] (Department of Mathematics University of Houston )Section 1.5 January 26, 2016 25 / 30

Page 26: Graphs and Describing Distributionscathy/Math2311/Lectures/chapter... · 2016. 1. 18. · Air Max CB34 Nike 110 Air Max Dominate Nike 75 Cathy Poliak, Ph.D. cathy@math.uh.edu (Department

Describing distributions

An initial view of the distribution and the characteristics can beshown through the graphs.Then we use numerical descriptions to get a better understandingof the distributions characteristics.

Cathy Poliak, Ph.D. [email protected] (Department of Mathematics University of Houston )Section 1.5 January 26, 2016 26 / 30

Page 27: Graphs and Describing Distributionscathy/Math2311/Lectures/chapter... · 2016. 1. 18. · Air Max CB34 Nike 110 Air Max Dominate Nike 75 Cathy Poliak, Ph.D. cathy@math.uh.edu (Department

Describing a distribution

ShapeI A distribution is symmetric if the right and left sides of the graph

are approximately mirror images of each other.I A distribution is skewed to the right if the right side (higher values)

of the graph extends much farther out than the left side.I A distribution is skewed to the left if the left side (lower values) of

the graph extends much farther out than the right side.I A distribution is uniform if the graph is at the same height

(frequency) from lowest to highest value of the variable.

Center - the values with roughly half the observations takingsmaller values and half taking larger values.Spread -from the graphs we describe the spread of a distributionby giving smallest and largest values.Outliers - individual values that falls outside the overall pattern.

Cathy Poliak, Ph.D. [email protected] (Department of Mathematics University of Houston )Section 1.5 January 26, 2016 27 / 30

Page 28: Graphs and Describing Distributionscathy/Math2311/Lectures/chapter... · 2016. 1. 18. · Air Max CB34 Nike 110 Air Max Dominate Nike 75 Cathy Poliak, Ph.D. cathy@math.uh.edu (Department

Distribution of the price of basketball shoes

Shape - longer tail on the upper values (right) skewed right.

Center - Approximately $90. Looking at the boxplot and from thehistogram $90 is at approximately half of the area of the graphs.

Spread - The interval of values is from $40 to $250.

From the three graphs we can see three distinct outliers.

Cathy Poliak, Ph.D. [email protected] (Department of Mathematics University of Houston )Section 1.5 January 26, 2016 28 / 30

Page 29: Graphs and Describing Distributionscathy/Math2311/Lectures/chapter... · 2016. 1. 18. · Air Max CB34 Nike 110 Air Max Dominate Nike 75 Cathy Poliak, Ph.D. cathy@math.uh.edu (Department

Distribution of exam 1 grades

1059075604530

50

40

30

20

10

0

Exam 1 Grade

Frequency

Histogram of Exam 1 Grade

Cathy Poliak, Ph.D. [email protected] (Department of Mathematics University of Houston )Section 1.5 January 26, 2016 29 / 30

Page 30: Graphs and Describing Distributionscathy/Math2311/Lectures/chapter... · 2016. 1. 18. · Air Max CB34 Nike 110 Air Max Dominate Nike 75 Cathy Poliak, Ph.D. cathy@math.uh.edu (Department

Distribution of prius hybrid MPG

5652484440

25

20

15

10

5

0

MPG

Frequency

Histogram of MPG

Cathy Poliak, Ph.D. [email protected] (Department of Mathematics University of Houston )Section 1.5 January 26, 2016 30 / 30