39
Making bar plots

Making bar plots...Bar plots of two-way tables Suppose you want to show that there is some kind of connection between the transmission system (automatic versus manual) and the number

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Making bar plots...Bar plots of two-way tables Suppose you want to show that there is some kind of connection between the transmission system (automatic versus manual) and the number

Making bar plots

Page 2: Making bar plots...Bar plots of two-way tables Suppose you want to show that there is some kind of connection between the transmission system (automatic versus manual) and the number

Plots and contingency tables

Plots are graphical representations of data. Plots of categorial data can be made on the basis of contingency tables.

Page 3: Making bar plots...Bar plots of two-way tables Suppose you want to show that there is some kind of connection between the transmission system (automatic versus manual) and the number

First plot: pie chart

pie(x)

function obligatory argument

Make a pie chart for the “miles per gallon” (mpg) variable 1. Make a contingency table of the mpg

variable 2. Make a pie chart with the function pie()

Page 4: Making bar plots...Bar plots of two-way tables Suppose you want to show that there is some kind of connection between the transmission system (automatic versus manual) and the number

First plot: pie chart

pie(x)

function obligatory argument

Make a pie chart for the “miles per gallon” (mpg) variable 1. Make a contingency table of the mpg

variable 2. Make a pie chart with the function pie()

Page 5: Making bar plots...Bar plots of two-way tables Suppose you want to show that there is some kind of connection between the transmission system (automatic versus manual) and the number

First plot: pie chart

Although pie charts are very common, they are criticized by most

statisticians who recommend bar or dot plots over pie charts

because people are able to judge length more accurately than

volume (no more pie charts!)

Page 6: Making bar plots...Bar plots of two-way tables Suppose you want to show that there is some kind of connection between the transmission system (automatic versus manual) and the number

Bar plot

A barplot is useful to represent frequency distributions of

categorial data. We can for instance try to make a barplot of the

number of gears variable (gear).

First we have to make a frequency distribution, then we use the

function barplot()

Page 7: Making bar plots...Bar plots of two-way tables Suppose you want to show that there is some kind of connection between the transmission system (automatic versus manual) and the number

Bar plot

Page 8: Making bar plots...Bar plots of two-way tables Suppose you want to show that there is some kind of connection between the transmission system (automatic versus manual) and the number

Bar plot

You can change the names of the bars with the names() function

Page 9: Making bar plots...Bar plots of two-way tables Suppose you want to show that there is some kind of connection between the transmission system (automatic versus manual) and the number

Bar plot

You can make your plot prettier by manipulating the colors, the font, etc. Here an example for the colors, with the extra argument col

http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf

Page 10: Making bar plots...Bar plots of two-way tables Suppose you want to show that there is some kind of connection between the transmission system (automatic versus manual) and the number

Bar plot

You can make your plot prettier by manipulating the colors, the font, etc. You can also add a title with the argument main

Page 11: Making bar plots...Bar plots of two-way tables Suppose you want to show that there is some kind of connection between the transmission system (automatic versus manual) and the number

Bar plot

You can make your plot prettier by manipulating the colors, the font, etc. Here an example for changing font size, with the arguments cex.axis (for the frequency scale on the y-axis) and cex.names (for the names we gave to the columns), and cex.main (for the title).

Page 12: Making bar plots...Bar plots of two-way tables Suppose you want to show that there is some kind of connection between the transmission system (automatic versus manual) and the number

Bar plots of two-way tables

Suppose you want to show that there is some kind of connection between the transmission system (automatic versus manual) and the number of gears 1. Create a two-way table from the mtcars dataset, using the functions

with() and table() – the relevant variables are gear and am. Assign it to some name.

2. Make a barplot of this

Page 13: Making bar plots...Bar plots of two-way tables Suppose you want to show that there is some kind of connection between the transmission system (automatic versus manual) and the number

Bar plots of two-way tables

Is this an insightful table? Why, why not? What can we do to make it more insightful?

Page 14: Making bar plots...Bar plots of two-way tables Suppose you want to show that there is some kind of connection between the transmission system (automatic versus manual) and the number

Bar plots of two-way tables

1. Adding names to the columns

Page 15: Making bar plots...Bar plots of two-way tables Suppose you want to show that there is some kind of connection between the transmission system (automatic versus manual) and the number

Bar plots of two-way tables

1. Adding names to the columns

> barplot(mytwowaytable, names=c(“automatic”, “manual”))

Page 16: Making bar plots...Bar plots of two-way tables Suppose you want to show that there is some kind of connection between the transmission system (automatic versus manual) and the number

Bar plots of two-way tables

1. Adding names to the columns 2. Adding a title to the graph

Page 17: Making bar plots...Bar plots of two-way tables Suppose you want to show that there is some kind of connection between the transmission system (automatic versus manual) and the number

Bar plots of two-way tables

1. Adding names to the columns 2. Adding a title to the graph

> barplot(mytwowaytable, names=c(“automatic”, “manual”), main=“transmission and number of gears))

Page 18: Making bar plots...Bar plots of two-way tables Suppose you want to show that there is some kind of connection between the transmission system (automatic versus manual) and the number

Bar plots of two-way tables

1. Adding names to the columns 2. Adding a title to the graph 3. Changing the colors

http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf

Page 19: Making bar plots...Bar plots of two-way tables Suppose you want to show that there is some kind of connection between the transmission system (automatic versus manual) and the number

Bar plots of two-way tables

1. Adding names to the columns 2. Adding a title to the graph 3. Changing the colors

> barplot(mytwowaytable, names=c(“automatic”, “manual”), main=“transmission and number of gears), col=c(“aquamarine4”, “cadetblue4”, “chartreuse4”))

Page 20: Making bar plots...Bar plots of two-way tables Suppose you want to show that there is some kind of connection between the transmission system (automatic versus manual) and the number

Bar plots of two-way tables

1. Adding names to the columns 2. Adding a title to the graph 3. Changing the colors 4. Add a legend with the

command legend.text()

> barplot(mytwowaytable, names=c(“automatic”, “manual”), main=“transmission and number of gears”, col=c(“aquamarine”, “cadetblue4”, “chartreuse4”), legend.text=c(“three gears”, “four gears”, “five gears”))

Page 21: Making bar plots...Bar plots of two-way tables Suppose you want to show that there is some kind of connection between the transmission system (automatic versus manual) and the number

Bar plots of two-way tables

1. Adding names to the columns 2. Adding a title to the graph 3. Changing the colors 4. Add a legend with the

command legend.text() 5. Alternative: grouped bar plot

> barplot(mytwowaytable, names=c(“automatic”, “manual”), main=“transmission and number of gears”, col=c(“aquamarine”, “cadetblue4”, “chartreuse4”), legend.text=c(“three gears”, “four gears”, “five gears”), beside=TRUE)

Page 22: Making bar plots...Bar plots of two-way tables Suppose you want to show that there is some kind of connection between the transmission system (automatic versus manual) and the number

Bar plots summary

A bar plot is a good way of visually representing categorial data. Bar plots are based on contingency, or frequency tables >> table() Bar plots are made with the function barplot(), where the minimal argument is a contingency table. Further graphical parameters can be added, like names= for adding names to the columns (vector) main= for a general title col= for colors of the bars (vector) legend.text= to add a legend to the graph and the names of the variables beside=TRUE for a grouped bar plot rather than a stacked one

> barplot(mytwowaytable, names=c(“automatic”, “manual”), main=“transmission and number of gears”, col=c(“aquamarine”, “cadetblue4”, “chartreuse4”), legend.text=c(“three gears”, “four gears”, “five gears”), beside=TRUE)

Page 23: Making bar plots...Bar plots of two-way tables Suppose you want to show that there is some kind of connection between the transmission system (automatic versus manual) and the number

Making bar plots 2: do it yourself

Page 24: Making bar plots...Bar plots of two-way tables Suppose you want to show that there is some kind of connection between the transmission system (automatic versus manual) and the number

Getting a dataset #2

Kabacoff 2011

Page 25: Making bar plots...Bar plots of two-way tables Suppose you want to show that there is some kind of connection between the transmission system (automatic versus manual) and the number

Getting a dataset #2

Kabacoff 2011

Page 26: Making bar plots...Bar plots of two-way tables Suppose you want to show that there is some kind of connection between the transmission system (automatic versus manual) and the number

Appropriate formats: CSV

• You can’t enter data directly from Excel, you have to

save it in another format (which is, thankfully, easy)

• A comma-separated values (.csv) file stores tabular

(=table) data in plain-text form

• a .csv file consists of any number of records (=rows)

separated by line breaks of some kind

• each record consists of fields (=columns) separated

by some character, most commonly a comma or

semicolon

Page 27: Making bar plots...Bar plots of two-way tables Suppose you want to show that there is some kind of connection between the transmission system (automatic versus manual) and the number

Excel may complain, just ‘yes’and ‘OK’ your way through it. Save it, close the open file (when it asks you to save it, say no).

Appropriate formats: CSV

Page 28: Making bar plots...Bar plots of two-way tables Suppose you want to show that there is some kind of connection between the transmission system (automatic versus manual) and the number

Depending on your software system, you may have to

replace the ; with ,

Or

You have to tell R what your separator

is.

...

Appropriate formats: CSV

Page 29: Making bar plots...Bar plots of two-way tables Suppose you want to show that there is some kind of connection between the transmission system (automatic versus manual) and the number

Appropriate formats: Tab-delimited

• A tab-delimited file (.txt) stores tabular (=table) data

in plain-text form

• A tab-delimited file consists of any number of

records (=rows) separated by line breaks of some

kind

• Each record consists of fields (=columns) separated

by a tab space

Page 30: Making bar plots...Bar plots of two-way tables Suppose you want to show that there is some kind of connection between the transmission system (automatic versus manual) and the number

Appropriate formats: Tab-delimited

Page 31: Making bar plots...Bar plots of two-way tables Suppose you want to show that there is some kind of connection between the transmission system (automatic versus manual) and the number

Appropriate formats: Tab-delimited

Page 32: Making bar plots...Bar plots of two-way tables Suppose you want to show that there is some kind of connection between the transmission system (automatic versus manual) and the number

languages<-read.csv(“/Users/Rik/Desktop/languagesSA.csv”)

Reading a csv file using the search path

On a Mac

In Windows

languages<-read.csv(“C:/Users/Rik/Desktop/languagesSA.csv”)

Getting datasets into R #1

Page 33: Making bar plots...Bar plots of two-way tables Suppose you want to show that there is some kind of connection between the transmission system (automatic versus manual) and the number

languages<-read.csv(“/Users/Rik/Desktop/languagesSA.csv”)

languages<-read.csv(“C:/Users/Rik/Desktop/languagesSA.csv”)

Reading a csv file using the search path

Getting datasets into R #1

File name Search path Command Variable name

Page 34: Making bar plots...Bar plots of two-way tables Suppose you want to show that there is some kind of connection between the transmission system (automatic versus manual) and the number

languages<-read.delim(“/Users/Rik/Desktop/languagesSA.txt”)

languages<-read.delim(“C:/Users/Rik/Desktop/languagesSA.txt”)

Reading a tab-delimited file using the search path

Getting datasets into R #1

File name Function

Page 35: Making bar plots...Bar plots of two-way tables Suppose you want to show that there is some kind of connection between the transmission system (automatic versus manual) and the number

Getting datasets into R #2

A way of getting datasets into R with slightly less work (at least if you

have to open several files) is by using your working directory

Your working space in R has a default place on your computer where it

looks for files if you don’t add any path: the working directory

With the function getwd() you can see where it is

You can either store all documents (.csv , .txt) that you use in R here, or

you can change the working directory with the function setwd(), and

type the path.

Page 36: Making bar plots...Bar plots of two-way tables Suppose you want to show that there is some kind of connection between the transmission system (automatic versus manual) and the number

Getting datasets into R #2

Watch out: in Windows you cannot simply copy the path, you have to

change backslashes to slashes

You can check whether you changed the working directory by typing

getwd()

Page 37: Making bar plots...Bar plots of two-way tables Suppose you want to show that there is some kind of connection between the transmission system (automatic versus manual) and the number

Getting datasets into R #2

Now you can just type the name of the file (if you have put it in the wd)

to read it into R:

Page 38: Making bar plots...Bar plots of two-way tables Suppose you want to show that there is some kind of connection between the transmission system (automatic versus manual) and the number

Getting datasets into R #3

Most similar to what you are probably used to is the command

Type the above and see what happens.

Choose our test file

Type languages (or whatever name you gave to the dataset) and enter

languages<-read.delim(file.choose())

Page 39: Making bar plots...Bar plots of two-way tables Suppose you want to show that there is some kind of connection between the transmission system (automatic versus manual) and the number

Do-it-yourself

ASSIGNMENT (in class):

Produce for variables of your choice from our LanguagesSA dataset:

A mean and standard deviation

A one-way frequency table

A bar plot of that table

A two-way frequency table

A bar plot (stacked or grouped) of that table