56
An Introduction to R for Epidemiologists using RStudio the basics Steve Mooney (much borrowed from C. DiMaggio) Department of Epidemiology Columbia University New York, NY 10032 [email protected] An Introduction to R for Epidemiologists using RStudio Introduction to R Concepts and Object Types SER Summer 2014

An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/intro.pdfFirst steps: Use R as a calculator math operators and functions arithmetic + , - , * , / power ^

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/intro.pdfFirst steps: Use R as a calculator math operators and functions arithmetic + , - , * , / power ^

An Introduction to R for Epidemiologists using RStudiothe basics

Steve Mooney (much borrowed from C. DiMaggio)

Department of EpidemiologyColumbia UniversityNew York, NY 10032

[email protected]

An Introduction to R for Epidemiologists using RStudioIntroduction to R Concepts and Object Types

SER Summer 2014

Page 2: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/intro.pdfFirst steps: Use R as a calculator math operators and functions arithmetic + , - , * , / power ^

Outline

1 getting our hands dirtycalculating, assigning, combiningfrom calculations to programming

2 how R thinks (vs. SAS and SPSS)

3 data

4 packages

5 help

6 objectsabout objectsvectormatrix & arraylistdataframe

S. Mooney (Columbia University) R intro 2014 2 / 56

Page 3: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/intro.pdfFirst steps: Use R as a calculator math operators and functions arithmetic + , - , * , / power ^

getting our hands dirty calculating, assigning, combining

First steps: Use R as a calculator

math operators and functions

arithmetic + , - , * , /

power ^

convert 68 degrees Fahrenheit to Celsius (C 0 = 59(F 0 − 32))

5/9*(68-32)

First type it directly in the console. Then type it into the editor and sendit to the console.(Remember how to do that?)

S. Mooney (Columbia University) R intro 2014 3 / 56

Page 4: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/intro.pdfFirst steps: Use R as a calculator math operators and functions arithmetic + , - , * , / power ^

getting our hands dirty calculating, assigning, combining

assignment operator‘memory’ key

<-

y <- 5/9*(68-32) #assignment (no display)

y

(y <- 5/9*(68-32)) #assignment (display)

S. Mooney (Columbia University) R intro 2014 4 / 56

Page 5: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/intro.pdfFirst steps: Use R as a calculator math operators and functions arithmetic + , - , * , / power ^

getting our hands dirty calculating, assigning, combining

functions

FunctionName(parameter1, parameter2, ...)

math operators and functions

mathematical functions - sqrt, log, exp, sin, cos, tan

simple functions - max, min, length, sum, mean, var, sort

abs(-23) #absolute value

exp(8) # exponentiation

log(exp(8)) # natural logarithm

sqrt(64) # square root

S. Mooney (Columbia University) R intro 2014 5 / 56

Page 6: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/intro.pdfFirst steps: Use R as a calculator math operators and functions arithmetic + , - , * , / power ^

getting our hands dirty calculating, assigning, combining

concatenation functioncombine or ”vectorize”

c()

x <- c(100,90,80,70,60)

x

y <- c("a", "b", "c", "d")

y

S. Mooney (Columbia University) R intro 2014 6 / 56

Page 7: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/intro.pdfFirst steps: Use R as a calculator math operators and functions arithmetic + , - , * , / power ^

getting our hands dirty calculating, assigning, combining

Put it together: Vectorized computations

The calcuation you tried with 68 (a scalar) can also work with the vectoryou just created:

5/9*(x-32)

z<-5/9*(x-32)

z

S. Mooney (Columbia University) R intro 2014 7 / 56

Page 8: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/intro.pdfFirst steps: Use R as a calculator math operators and functions arithmetic + , - , * , / power ^

getting our hands dirty from calculations to programming

write your own functionR is a programming language

my.function<-function(x){

5/9*(x-32)

}

my.function(68)

[1] 20

a<-c(134,156,222)

my.function(a)

[1] 56.66667 68.88889 105.55556

We’ll revisit creating functions if we have time...

S. Mooney (Columbia University) R intro 2014 8 / 56

Page 9: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/intro.pdfFirst steps: Use R as a calculator math operators and functions arithmetic + , - , * , / power ^

how R thinks (vs. SAS and SPSS)

Outline

1 getting our hands dirtycalculating, assigning, combiningfrom calculations to programming

2 how R thinks (vs. SAS and SPSS)

3 data

4 packages

5 help

6 objectsabout objectsvectormatrix & arraylistdataframe

S. Mooney (Columbia University) R intro 2014 9 / 56

Page 10: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/intro.pdfFirst steps: Use R as a calculator math operators and functions arithmetic + , - , * , / power ^

how R thinks (vs. SAS and SPSS)

A Quick Warning...

We’re going to shift gears into abstract territory for a little while.

I hope the material that follows will orient you as we learn more concretepieces of R.

But I want to acknowledge that it gets away from the concrete; pleasedon’t worry if you feel you’re not grasping all the details fully right now.

S. Mooney (Columbia University) R intro 2014 10 / 56

Page 11: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/intro.pdfFirst steps: Use R as a calculator math operators and functions arithmetic + , - , * , / power ^

how R thinks (vs. SAS and SPSS)

Programming and AnalyzingHow to work with R

In my experience, data analysis is usually an iterative process:

1 Massage data (merge datasets, select the items you want to analyze,ensure measures are created properly, etc)

2 Call some procedure to do analytic step (e.g. look at 2x2 table)

3 Interpret output & generate new questions (back to step 1 or 2)

In SAS (& SPSS(?)), data massage mostly happens in DATA statementsand analysis mostly happens in PROC steps.

In R, there’s no formal separation between massage and analysis: we usesimilar functions for both.

S. Mooney (Columbia University) R intro 2014 11 / 56

Page 12: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/intro.pdfFirst steps: Use R as a calculator math operators and functions arithmetic + , - , * , / power ^

how R thinks (vs. SAS and SPSS)

Programming and AnalyzingGetting abstract for a moment...

Most data massage and analysis procedures actually have similar components:

1 Three classes of thing you tell the statistical program:1 What type of operation to do2 How specifically to do it3 A dataset to do it on

2 Two types of thing happen when you run the code:1 Changes get made to the data2 Output or results are returned

Let’s look at how this plays out in SAS, SPSS, and R...

S. Mooney (Columbia University) R intro 2014 12 / 56

Page 13: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/intro.pdfFirst steps: Use R as a calculator math operators and functions arithmetic + , - , * , / power ^

how R thinks (vs. SAS and SPSS)

Programming and AnalyzingFunction input (SAS example)

In SAS:

1 The type of thing to do is the DATA or PROC XYZ statement.2 The dataset to use is specified with data=XYZ for a PROC step and set

XYZ; for a data step.3 And everything else is how specifically to do it.

For example, consider the following SAS statement:

PROC FREQ DATA=XYZ; table X*Y/missing; RUN;

FREQ is the type of thing.XYZ is the dataset (and the X*Y specifies the columns)./missing is how specifically to do the FREQ

S. Mooney (Columbia University) R intro 2014 13 / 56

Page 14: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/intro.pdfFirst steps: Use R as a calculator math operators and functions arithmetic + , - , * , / power ^

how R thinks (vs. SAS and SPSS)

Programming and AnalyzingFunction input (SPSS example)

In SPSS:

1 The type of thing to do is the statement type.2 The dataset is implicit based on a previous DATA statement.3 And everything else is how specifically to do it.

For example, consider the following SPSS statement:

crosstabs

/tables X by Y

/missing=report

crosstabs is the type of thing.the current dataset is the dataset (and X by Y specifies the columns)./missing= report indicates how specifically to do the crosstab

S. Mooney (Columbia University) R intro 2014 14 / 56

Page 15: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/intro.pdfFirst steps: Use R as a calculator math operators and functions arithmetic + , - , * , / power ^

how R thinks (vs. SAS and SPSS)

Programming and AnalyzingFunction input in R

In R:

1 The type of thing to do is the function name type.2 The dataset and how specifically to do it are both parameters to the

function.

For example, consider the following R statement:

table(XYZ$X, XYZ$Y, na.rm=TRUE)

table is the type of thing.XYZ$X and XYZ$Y are the data.na.rm=TRUE is how specifically to handle missing data

S. Mooney (Columbia University) R intro 2014 15 / 56

Page 16: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/intro.pdfFirst steps: Use R as a calculator math operators and functions arithmetic + , - , * , / power ^

how R thinks (vs. SAS and SPSS)

Programming and AnalyzingFunction output in R

I claimed that analytic steps have up to two kinds of effects:

1 Changes made to the data2 Output or results returned

In SAS and SPSS:

1 There is an output window that displays the results of an analyticprocedure.

2 Some procedures change data and others do not. The programmer knowswhich procedures modify the data

In R:

1 An analytic function typically returns an object whose default display isthe result of interest.

2 If the programmer wants data modified by the procedure, she or he usuallyworks with the return value of the function in the next programming step.

S. Mooney (Columbia University) R intro 2014 16 / 56

Page 17: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/intro.pdfFirst steps: Use R as a calculator math operators and functions arithmetic + , - , * , / power ^

how R thinks (vs. SAS and SPSS)

Programming and AnalyzingFunction output in R

Consider the R statement

table(XYZ$X, XYZ$Y, na.rm=TRUE)

This is a function that returns an object (a 2x2 matrix, in this case) whosedefault visualization looks like a 2x2 table.

If you want a chi-square test on that 2x2 table, you can use the output fromthe table function as the input to the chisq.test function as follows:

chisq.test(table(XYZ$X, XYZ$Y, na.rm=TRUE))

Using return values rather than side effects is characteristic of a functionalprogramming model of language design.

Don’t worry

This may seem complicated or abstract, but it will become more clear afterusing R more.

S. Mooney (Columbia University) R intro 2014 17 / 56

Page 18: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/intro.pdfFirst steps: Use R as a calculator math operators and functions arithmetic + , - , * , / power ^

data

Outline

1 getting our hands dirtycalculating, assigning, combiningfrom calculations to programming

2 how R thinks (vs. SAS and SPSS)

3 data

4 packages

5 help

6 objectsabout objectsvectormatrix & arraylistdataframe

S. Mooney (Columbia University) R intro 2014 18 / 56

Page 19: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/intro.pdfFirst steps: Use R as a calculator math operators and functions arithmetic + , - , * , / power ^

data

the cbind() functionsCombining vectors into matrices

weight <- c(134, 156, 222)

height <- c(60, 63, 72)

bmi <- (weight*703)/height^2

cbind(weight, height, bmi)

weight height bmi

[1,] 134 60 26.16722

[2,] 156 63 27.63114

[3,] 222 72 30.10532

S. Mooney (Columbia University) R intro 2014 19 / 56

Page 20: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/intro.pdfFirst steps: Use R as a calculator math operators and functions arithmetic + , - , * , / power ^

data

getting your own data into R”there’s a function for that”

read.table() (/read.csv/read.fwf) is how you get data into base R

but RStudio’s Import Dataset can generate the code for you...

cars<-read.table(

"http://www.columbia.edu/~sjm2186/SER2014/cars.txt",

header=T, stringsAsFactors=F)

str(cars)

We will revisit this...

S. Mooney (Columbia University) R intro 2014 20 / 56

Page 21: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/intro.pdfFirst steps: Use R as a calculator math operators and functions arithmetic + , - , * , / power ^

packages

Outline

1 getting our hands dirtycalculating, assigning, combiningfrom calculations to programming

2 how R thinks (vs. SAS and SPSS)

3 data

4 packages

5 help

6 objectsabout objectsvectormatrix & arraylistdataframe

S. Mooney (Columbia University) R intro 2014 21 / 56

Page 22: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/intro.pdfFirst steps: Use R as a calculator math operators and functions arithmetic + , - , * , / power ^

packages

packages

Packages contain code that enable extra functionality in RAnalogous to a SAS file containing several macros

install.packages("epitools")

library(epitools)

epitab(c(10, 20, 30, 40))

We will revisit these as well...

S. Mooney (Columbia University) R intro 2014 22 / 56

Page 23: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/intro.pdfFirst steps: Use R as a calculator math operators and functions arithmetic + , - , * , / power ^

help

Outline

1 getting our hands dirtycalculating, assigning, combiningfrom calculations to programming

2 how R thinks (vs. SAS and SPSS)

3 data

4 packages

5 help

6 objectsabout objectsvectormatrix & arraylistdataframe

S. Mooney (Columbia University) R intro 2014 23 / 56

Page 24: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/intro.pdfFirst steps: Use R as a calculator math operators and functions arithmetic + , - , * , / power ^

help

getting help

R has a lot of built-in help mechanisms...

help() opens help page

apropos() displays all objects matching topic

library(help=packageName) help on a specific package

vignette(package=”packageName”);

help(sample) ; ?sample ; ??sample

apropos("sam")

library(help=epitools)

vignette(package="utils")

vignette("Sweave")

S. Mooney (Columbia University) R intro 2014 24 / 56

Page 25: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/intro.pdfFirst steps: Use R as a calculator math operators and functions arithmetic + , - , * , / power ^

help

getting help

..but web resources can be even more helpful:

tutorial: http://www.ats.ucla.edu/stat/r/

search: http://www.r-project.org/search.html

books: Venebles, Aragon, etc.

Two major online communities:

R mailing list archive: http://r.789695.n4.nabble.com/

Stack Overflow: http://stackoverflow.com/questions/tagged/r

S. Mooney (Columbia University) R intro 2014 25 / 56

Page 26: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/intro.pdfFirst steps: Use R as a calculator math operators and functions arithmetic + , - , * , / power ^

objects

Outline

1 getting our hands dirtycalculating, assigning, combiningfrom calculations to programming

2 how R thinks (vs. SAS and SPSS)

3 data

4 packages

5 help

6 objectsabout objectsvectormatrix & arraylistdataframe

S. Mooney (Columbia University) R intro 2014 26 / 56

Page 27: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/intro.pdfFirst steps: Use R as a calculator math operators and functions arithmetic + , - , * , / power ^

objects

5 important objects

objects are ”specialized data structures”

1 vector - collection of like elements (numbers, characters...)

2 matrix - 2-dimensional vector

3 array - >2-dimensional vector

4 list - collection of groups of like elements any kind

5 dataframe - tabular data set, each row a record, each column a (like)element or variable

S. Mooney (Columbia University) R intro 2014 27 / 56

Page 28: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/intro.pdfFirst steps: Use R as a calculator math operators and functions arithmetic + , - , * , / power ^

objects

objects for epidemiologists

matrix for contingency, e.g. 2x2, tables

arrays for stratified tables

dataframe for observations and variables

factors for categorical variables

numeric representation of charactersread.table converts characters to factors

S. Mooney (Columbia University) R intro 2014 28 / 56

Page 29: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/intro.pdfFirst steps: Use R as a calculator math operators and functions arithmetic + , - , * , / power ^

objects

examples of R objects

x <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)

y <- matrix(x, nrow = 2)

z <- array(x, dim = c(2, 3, 2))

mylist <- list(x, y, z)

names <- c("alice", "bob", "charlie")

gender <- c("girl", "boy", "boy")

age <- c(28, 22, 34)

race <- factor(c("Asian", "Asian", "Black"),

levels=c("Asian", "Black", "White"))

data<- data.frame(names, gender, age, race)

S. Mooney (Columbia University) R intro 2014 29 / 56

Page 30: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/intro.pdfFirst steps: Use R as a calculator math operators and functions arithmetic + , - , * , / power ^

objects about objects

Outline

1 getting our hands dirtycalculating, assigning, combiningfrom calculations to programming

2 how R thinks (vs. SAS and SPSS)

3 data

4 packages

5 help

6 objectsabout objectsvectormatrix & arraylistdataframe

S. Mooney (Columbia University) R intro 2014 30 / 56

Page 31: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/intro.pdfFirst steps: Use R as a calculator math operators and functions arithmetic + , - , * , / power ^

objects about objects

modesatomic vs recursive objects

mode - type of data: numeric, character, logical, factor

age <- c(34, 20); mode(age)

lt25 <- age<25

lt25

mode(lt25)

atomic - only one modecharacter, numeric, factor, or logical, i.e. vectors, matrices, arrays

logical (1 for TRUE 0 for FALSE)categorical - appear numeric but stored as factors

recursive - more than one mode

lists, data frames, functions

S. Mooney (Columbia University) R intro 2014 31 / 56

Page 32: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/intro.pdfFirst steps: Use R as a calculator math operators and functions arithmetic + , - , * , / power ^

objects about objects

atomic objectsall elements the same

vector: one dimension

y <- c("Tom", "Dick", "Harry") ; y #character

x <- c(1, 2, 3, 4, 5) ; x #numeric

z <- x<3 ; z #logical

matrix: two-dimensional vector

x <- c("a", "b", "c", "d")

y <- matrix(x, 2, 2) ; y

array: n-dimensional vector

x <- 1:8

y <- array(x, dim=c(2, 2, 2)) ; y

S. Mooney (Columbia University) R intro 2014 32 / 56

Page 33: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/intro.pdfFirst steps: Use R as a calculator math operators and functions arithmetic + , - , * , / power ^

objects about objects

recursive objectsdiffer

list: collections of data

x <- c(1, 2, 3)

y <- c("Male", "Female", "Male")

z <- matrix(1:4, 2, 2)

xyz <- list(x, y, z)

dataframe: tabular (2-dimensional) list

subjno <- c(1, 2, 3, 4) ; age <- c(34, 56, 45, 23)

sex <- c("Male", "Male", "Female", "Male")

case <- c("Yes", "No", "No", "Yes")

mydat <- data.frame(subjno, age, sex, case) ; mydat

S. Mooney (Columbia University) R intro 2014 33 / 56

Page 34: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/intro.pdfFirst steps: Use R as a calculator math operators and functions arithmetic + , - , * , / power ^

objects about objects

coercionchanging an object’s mode

this is importantR will automatically coerce all the elements in an atomic object to a singlemode (character >numeric >logical)

c("hello", 4.56, FALSE)

c(4.56, FALSE)

S. Mooney (Columbia University) R intro 2014 34 / 56

Page 35: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/intro.pdfFirst steps: Use R as a calculator math operators and functions arithmetic + , - , * , / power ^

objects about objects

coercing objectsdo it yourself

is.xxx / as.xxx - to assess / coerce objectsxxx = vector, matrix, array, list, data.frame, function, character,numeric, factor, na etc...

is.matrix(1:3) # false

as.matrix(1:3)

is.matrix(as.matrix(1:3)) # true

# coercing factor to character

sex <- factor(c("M", "M", "M", "M", "F", "F", "F", "F"))

sex

unclass(sex) #does not coerce into character

as.character(sex) #works

S. Mooney (Columbia University) R intro 2014 35 / 56

Page 36: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/intro.pdfFirst steps: Use R as a calculator math operators and functions arithmetic + , - , * , / power ^

objects about objects

Reviewbasic characteristics of R objects

Objects - vector, matrix, array, list, dataframe

mode() - ”type” of object: numeric, character, factor, logical

vectors and matrices - atomic, one mode onlylists and data frames - recursive, can be of >1 mode

class() - for simple vectors, same as mode

more complex objects, array and data frames have their own classaffects how printed, plotted and otherwise handled

S. Mooney (Columbia University) R intro 2014 36 / 56

Page 37: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/intro.pdfFirst steps: Use R as a calculator math operators and functions arithmetic + , - , * , / power ^

objects vector

Outline

1 getting our hands dirtycalculating, assigning, combiningfrom calculations to programming

2 how R thinks (vs. SAS and SPSS)

3 data

4 packages

5 help

6 objectsabout objectsvectormatrix & arraylistdataframe

S. Mooney (Columbia University) R intro 2014 37 / 56

Page 38: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/intro.pdfFirst steps: Use R as a calculator math operators and functions arithmetic + , - , * , / power ^

objects vector

vectors are 1-dimensional strings of like elements

the basic building block of data in R

use them for quick data entry

S. Mooney (Columbia University) R intro 2014 38 / 56

Page 39: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/intro.pdfFirst steps: Use R as a calculator math operators and functions arithmetic + , - , * , / power ^

objects vector

fun with vectors

y<-1:5 #create a vector of consecutive integers

y+2 #scalar addition

2*y #scalar multiplication

x<-c(1,3,2,10,5)

cumsum(x)

S. Mooney (Columbia University) R intro 2014 39 / 56

Page 40: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/intro.pdfFirst steps: Use R as a calculator math operators and functions arithmetic + , - , * , / power ^

objects vector

more fun with vectorsvectorized arithmetic

c(1,2,3,4)/2

c(1,2,3,4)/c(4,3,2,1)

log(c(0.1,1,10,100), 10)

c(1,2,3,4) + c(4,3)

c(1,2,3,4) + c(4,3,2)

S. Mooney (Columbia University) R intro 2014 40 / 56

Page 41: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/intro.pdfFirst steps: Use R as a calculator math operators and functions arithmetic + , - , * , / power ^

objects vector

creating numerical vectorssequences

the sequence operator :

-9:8

seq() greater flexibility

> seq(1, 5, by = 0.5) # specify interval

> seq(1, 5, length = 8) #specify length

S. Mooney (Columbia University) R intro 2014 41 / 56

Page 42: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/intro.pdfFirst steps: Use R as a calculator math operators and functions arithmetic + , - , * , / power ^

objects vector

operations on vectors

x <- rnorm(100)

sum(x)

x <- rep(2, 10)

cumsum(x)

mean(x)

sum(x)/length(x)

var(x) #sample variance

sd(x)

sqrt(var(x)) #sample standard deviation

x <- rnorm(100)

y <- rnorm(100)

var(x, y) # covariance

S. Mooney (Columbia University) R intro 2014 42 / 56

Page 43: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/intro.pdfFirst steps: Use R as a calculator math operators and functions arithmetic + , - , * , / power ^

objects vector

logical vectorsthe special vector...

series of TRUEs and FALSEs (Ts and Fs)

created with relational operators:

<, >, <=, >=, ==, !=

used to index, select and subset data

S. Mooney (Columbia University) R intro 2014 43 / 56

Page 44: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/intro.pdfFirst steps: Use R as a calculator math operators and functions arithmetic + , - , * , / power ^

objects vector

about logical vectors

logical operators are the key to indexing, and indexing is the keyto manipulating data

= <= > >= == !

x<-1:26

temp<- x > 13 #logical vector temp

#same length as vector x

#TRUE= 1, when condition met

#FALSE = 0, when not met

sum(temp)

We will revist logical vectors when we discuss indexing (coming soon...)

S. Mooney (Columbia University) R intro 2014 44 / 56

Page 45: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/intro.pdfFirst steps: Use R as a calculator math operators and functions arithmetic + , - , * , / power ^

objects matrix & array

Outline

1 getting our hands dirtycalculating, assigning, combiningfrom calculations to programming

2 how R thinks (vs. SAS and SPSS)

3 data

4 packages

5 help

6 objectsabout objectsvectormatrix & arraylistdataframe

S. Mooney (Columbia University) R intro 2014 45 / 56

Page 46: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/intro.pdfFirst steps: Use R as a calculator math operators and functions arithmetic + , - , * , / power ^

objects matrix & array

a matrix is a 2-dimensional vector2x2 and contingency tables

Option 1: define the matrix from raw data:

myMatrix<-matrix(c("a","b","c","d"),2,2)

myMatrix

myMatrix2<-matrix(c("a","b","c","d"),2,2, byrow=T)

myMatrix2

colnames(myMatrix2)<-c("case", "control")

rownames(myMatrix2)<-c("exposed", "unexposed")

myMatrix2

S. Mooney (Columbia University) R intro 2014 46 / 56

Page 47: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/intro.pdfFirst steps: Use R as a calculator math operators and functions arithmetic + , - , * , / power ^

objects matrix & array

cbind and rbind

Option 2: define the data by binding vectors together:

names<-c("Alice", "Bob", "Charlie")

ages<-c(6,7,8)

names

ages

cbind(names, ages)

rbind(names, ages)

S. Mooney (Columbia University) R intro 2014 47 / 56

Page 48: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/intro.pdfFirst steps: Use R as a calculator math operators and functions arithmetic + , - , * , / power ^

objects matrix & array

caution: recycling

cbind and rbind - will recycle data

when performing vector or mixed vector and array arithmetic, shortvectors are extended by recycling till they match size of otheroperands

R may return an error message, but still complete the operation

S. Mooney (Columbia University) R intro 2014 48 / 56

Page 49: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/intro.pdfFirst steps: Use R as a calculator math operators and functions arithmetic + , - , * , / power ^

objects matrix & array

creating a matrix from dataThe more common scenario...

table() - from characters

titanic<-read.csv(

"http://www.columbia.edu/~sjm2186/SER2014/titanic.csv",

stringsAsFactors=F) #load titanic data

str(titanic) # Check the structure

table(titanic$sex,titanic$survived) # Make the matrix

S. Mooney (Columbia University) R intro 2014 49 / 56

Page 50: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/intro.pdfFirst steps: Use R as a calculator math operators and functions arithmetic + , - , * , / power ^

objects matrix & array

an array is an n-dimensional vectorstratified epi tables

stratified titanic survival table:

sex vs. survival vs. passenger class

table(titanic$sex,titanic$survived, titanic$pclass)

S. Mooney (Columbia University) R intro 2014 50 / 56

Page 51: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/intro.pdfFirst steps: Use R as a calculator math operators and functions arithmetic + , - , * , / power ^

objects list

Outline

1 getting our hands dirtycalculating, assigning, combiningfrom calculations to programming

2 how R thinks (vs. SAS and SPSS)

3 data

4 packages

5 help

6 objectsabout objectsvectormatrix & arraylistdataframe

S. Mooney (Columbia University) R intro 2014 51 / 56

Page 52: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/intro.pdfFirst steps: Use R as a calculator math operators and functions arithmetic + , - , * , / power ^

objects list

a list is a recursive collection of unlike elementslike epi ”variables” and ”observations”

often used to ”store” function results

str() is your friend

also, see stackoverflow discussion

x <- 1:5 ; y <- matrix(c("a","c","b","d"), 2,2)

z <- c("Peter", "Paul", "Mary")

mm <- list(x, y, z)

mm

str(mm)

S. Mooney (Columbia University) R intro 2014 52 / 56

Page 53: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/intro.pdfFirst steps: Use R as a calculator math operators and functions arithmetic + , - , * , / power ^

objects dataframe

Outline

1 getting our hands dirtycalculating, assigning, combiningfrom calculations to programming

2 how R thinks (vs. SAS and SPSS)

3 data

4 packages

5 help

6 objectsabout objectsvectormatrix & arraylistdataframe

S. Mooney (Columbia University) R intro 2014 53 / 56

Page 54: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/intro.pdfFirst steps: Use R as a calculator math operators and functions arithmetic + , - , * , / power ^

objects dataframe

dataframestabular epi data sets

2-dimensional tabular lists with equal-length fieldseach row is a record or observationeach column is a field or variable (usually numeric vector or factors)

data(infert)

str(infert)

head(infert)

”a list that behaves like a matrix”

S. Mooney (Columbia University) R intro 2014 54 / 56

Page 55: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/intro.pdfFirst steps: Use R as a calculator math operators and functions arithmetic + , - , * , / power ^

objects dataframe

creating data frames

1 data.frame()

x <- data.frame(id=1:2, sex=c("M","F"))

2 read.table(), read.csv(), read.delim(), read.fwf()

titanic<-read.csv(

"http://www.columbia.edu/~sjm2186/SER2014/titanic.csv",

stringsAsFactors=F) #load titanic data

str(titanic)

(caution: default char → factor, numeric → integer)

S. Mooney (Columbia University) R intro 2014 55 / 56

Page 56: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/intro.pdfFirst steps: Use R as a calculator math operators and functions arithmetic + , - , * , / power ^

objects dataframe

Exercises

You should now be able to complete exercises 1 and 2 inhttp://www.columbia.edu/~sjm2186/SER2014/Exercises.pdf

S. Mooney (Columbia University) R intro 2014 56 / 56