Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Making bar plots
Plots and contingency tables
Plots are graphical representations of data. Plots of categorial data can be made on the basis of contingency tables.
First plot: pie chart
pie(x)
function obligatory argument
Make a pie chart for the “miles per gallon” (mpg) variable 1. Make a contingency table of the mpg
variable 2. Make a pie chart with the function pie()
First plot: pie chart
pie(x)
function obligatory argument
Make a pie chart for the “miles per gallon” (mpg) variable 1. Make a contingency table of the mpg
variable 2. Make a pie chart with the function pie()
First plot: pie chart
Although pie charts are very common, they are criticized by most
statisticians who recommend bar or dot plots over pie charts
because people are able to judge length more accurately than
volume (no more pie charts!)
Bar plot
A barplot is useful to represent frequency distributions of
categorial data. We can for instance try to make a barplot of the
number of gears variable (gear).
First we have to make a frequency distribution, then we use the
function barplot()
Bar plot
Bar plot
You can change the names of the bars with the names() function
Bar plot
You can make your plot prettier by manipulating the colors, the font, etc. Here an example for the colors, with the extra argument col
http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf
Bar plot
You can make your plot prettier by manipulating the colors, the font, etc. You can also add a title with the argument main
Bar plot
You can make your plot prettier by manipulating the colors, the font, etc. Here an example for changing font size, with the arguments cex.axis (for the frequency scale on the y-axis) and cex.names (for the names we gave to the columns), and cex.main (for the title).
Bar plots of two-way tables
Suppose you want to show that there is some kind of connection between the transmission system (automatic versus manual) and the number of gears 1. Create a two-way table from the mtcars dataset, using the functions
with() and table() – the relevant variables are gear and am. Assign it to some name.
2. Make a barplot of this
Bar plots of two-way tables
Is this an insightful table? Why, why not? What can we do to make it more insightful?
Bar plots of two-way tables
1. Adding names to the columns
Bar plots of two-way tables
1. Adding names to the columns
> barplot(mytwowaytable, names=c(“automatic”, “manual”))
Bar plots of two-way tables
1. Adding names to the columns 2. Adding a title to the graph
Bar plots of two-way tables
1. Adding names to the columns 2. Adding a title to the graph
> barplot(mytwowaytable, names=c(“automatic”, “manual”), main=“transmission and number of gears))
Bar plots of two-way tables
1. Adding names to the columns 2. Adding a title to the graph 3. Changing the colors
http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf
Bar plots of two-way tables
1. Adding names to the columns 2. Adding a title to the graph 3. Changing the colors
> barplot(mytwowaytable, names=c(“automatic”, “manual”), main=“transmission and number of gears), col=c(“aquamarine4”, “cadetblue4”, “chartreuse4”))
Bar plots of two-way tables
1. Adding names to the columns 2. Adding a title to the graph 3. Changing the colors 4. Add a legend with the
command legend.text()
> barplot(mytwowaytable, names=c(“automatic”, “manual”), main=“transmission and number of gears”, col=c(“aquamarine”, “cadetblue4”, “chartreuse4”), legend.text=c(“three gears”, “four gears”, “five gears”))
Bar plots of two-way tables
1. Adding names to the columns 2. Adding a title to the graph 3. Changing the colors 4. Add a legend with the
command legend.text() 5. Alternative: grouped bar plot
> barplot(mytwowaytable, names=c(“automatic”, “manual”), main=“transmission and number of gears”, col=c(“aquamarine”, “cadetblue4”, “chartreuse4”), legend.text=c(“three gears”, “four gears”, “five gears”), beside=TRUE)
Bar plots summary
A bar plot is a good way of visually representing categorial data. Bar plots are based on contingency, or frequency tables >> table() Bar plots are made with the function barplot(), where the minimal argument is a contingency table. Further graphical parameters can be added, like names= for adding names to the columns (vector) main= for a general title col= for colors of the bars (vector) legend.text= to add a legend to the graph and the names of the variables beside=TRUE for a grouped bar plot rather than a stacked one
> barplot(mytwowaytable, names=c(“automatic”, “manual”), main=“transmission and number of gears”, col=c(“aquamarine”, “cadetblue4”, “chartreuse4”), legend.text=c(“three gears”, “four gears”, “five gears”), beside=TRUE)
Making bar plots 2: do it yourself
Getting a dataset #2
Kabacoff 2011
Getting a dataset #2
Kabacoff 2011
Appropriate formats: CSV
• You can’t enter data directly from Excel, you have to
save it in another format (which is, thankfully, easy)
• A comma-separated values (.csv) file stores tabular
(=table) data in plain-text form
• a .csv file consists of any number of records (=rows)
separated by line breaks of some kind
• each record consists of fields (=columns) separated
by some character, most commonly a comma or
semicolon
Excel may complain, just ‘yes’and ‘OK’ your way through it. Save it, close the open file (when it asks you to save it, say no).
Appropriate formats: CSV
Depending on your software system, you may have to
replace the ; with ,
Or
You have to tell R what your separator
is.
...
Appropriate formats: CSV
Appropriate formats: Tab-delimited
• A tab-delimited file (.txt) stores tabular (=table) data
in plain-text form
• A tab-delimited file consists of any number of
records (=rows) separated by line breaks of some
kind
• Each record consists of fields (=columns) separated
by a tab space
Appropriate formats: Tab-delimited
Appropriate formats: Tab-delimited
languages<-read.csv(“/Users/Rik/Desktop/languagesSA.csv”)
Reading a csv file using the search path
On a Mac
In Windows
languages<-read.csv(“C:/Users/Rik/Desktop/languagesSA.csv”)
Getting datasets into R #1
languages<-read.csv(“/Users/Rik/Desktop/languagesSA.csv”)
languages<-read.csv(“C:/Users/Rik/Desktop/languagesSA.csv”)
Reading a csv file using the search path
Getting datasets into R #1
File name Search path Command Variable name
languages<-read.delim(“/Users/Rik/Desktop/languagesSA.txt”)
languages<-read.delim(“C:/Users/Rik/Desktop/languagesSA.txt”)
Reading a tab-delimited file using the search path
Getting datasets into R #1
File name Function
Getting datasets into R #2
A way of getting datasets into R with slightly less work (at least if you
have to open several files) is by using your working directory
Your working space in R has a default place on your computer where it
looks for files if you don’t add any path: the working directory
With the function getwd() you can see where it is
You can either store all documents (.csv , .txt) that you use in R here, or
you can change the working directory with the function setwd(), and
type the path.
Getting datasets into R #2
Watch out: in Windows you cannot simply copy the path, you have to
change backslashes to slashes
You can check whether you changed the working directory by typing
getwd()
Getting datasets into R #2
Now you can just type the name of the file (if you have put it in the wd)
to read it into R:
Getting datasets into R #3
Most similar to what you are probably used to is the command
Type the above and see what happens.
Choose our test file
Type languages (or whatever name you gave to the dataset) and enter
languages<-read.delim(file.choose())
Do-it-yourself
ASSIGNMENT (in class):
Produce for variables of your choice from our LanguagesSA dataset:
A mean and standard deviation
A one-way frequency table
A bar plot of that table
A two-way frequency table
A bar plot (stacked or grouped) of that table