8
Rattle to R R Bohn April 21, 2015 update 2018 1 R is …. A programming language (source) code Interactive language (opposite of compiled) All R code and (non-visual) results are just text copy and paste Manipulate the text = manipulate the code Save in word processor 2

Rattle to R - WordPress.com...Rattle to R R Bohn April 21, 2015 update 2018 1 R is …. • A programming language • (source) code • Interactive language (opposite of compiled)

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Rattle to R - WordPress.com...Rattle to R R Bohn April 21, 2015 update 2018 1 R is …. • A programming language • (source) code • Interactive language (opposite of compiled)

Rattle to RR Bohn April 21, 2015 update 2018

1

R is ….

• A programming language • (source) code • Interactive language (opposite of compiled)

• All R code and (non-visual) results are just text • copy and paste • Manipulate the text = manipulate the code • Save in word processor

2

Page 2: Rattle to R - WordPress.com...Rattle to R R Bohn April 21, 2015 update 2018 1 R is …. • A programming language • (source) code • Interactive language (opposite of compiled)

3

We should look at the data, here (Exploratory) 4

Page 3: Rattle to R - WordPress.com...Rattle to R R Bohn April 21, 2015 update 2018 1 R is …. • A programming language • (source) code • Interactive language (opposite of compiled)

5

Paste this into RStudio

# Rattle timestamp: 2015-04-21 11:25:13 x86_64-apple-darwin13.4.0

# Note the user selections. # Build the training/validate/test datasets.

set.seed(crv$seed) crs$nobs <- nrow(crs$dataset) # 1436 observations

crs$sample <- crs$train <- sample(nrow(crs$dataset), 0.7*crs$nobs) # 1005 observations

crs$validate <- sample(setdiff(seq_len(nrow(crs$dataset)), crs$train), 0.15*crs$nobs) # 215 observations

crs$test <- setdiff(setdiff(seq_len(nrow(crs$dataset)), crs$train), crs$validate) # 216 observations

6

Page 4: Rattle to R - WordPress.com...Rattle to R R Bohn April 21, 2015 update 2018 1 R is …. • A programming language • (source) code • Interactive language (opposite of compiled)

# The following variable selections have been noted.

crs$input <- c("Age_08_04", "Mfg_Year", "KM", "Fuel_Type", "HP", "Doors", "Quarterly_Tax", "Weight", "Guarantee_Period", "TFC_Met_Color", "TFC_Automatic", "TFC_Mfr_Guarantee", "TFC_BOVAG_Guarantee", "TFC_ABS", "TFC_Airbag_1", "TFC_Airco", "TFC_Automatic_airco", "TFC_Power_Steering")

crs$numeric <- c("Age_08_04", "Mfg_Year", "KM", "HP", "Doors", "Quarterly_Tax", "Weight", "Guarantee_Period")

crs$categoric <- c("Fuel_Type", "TFC_Met_Color", "TFC_Automatic", "TFC_Mfr_Guarantee", "TFC_BOVAG_Guarantee", "TFC_ABS", "TFC_Airbag_1", "TFC_Airco", "TFC_Automatic_airco", "TFC_Power_Steering")

7

crs$target <- "Price" crs$risk <- NULL crs$ident <- "Id”

crs$ignore <- c ("Model", "Mfg_Month", "Met_Color", "Color", "Automatic", "CC", "Cylinders", "Gears", "Mfr_Guarantee", "BOVAG_Guarantee", "ABS", "Airbag_1", "Airbag_2", "Airco", "Automatic_airco", "Boardcomputer", "CD_Player", "Central_Lock", "Powered_Windows", "Power_Steering", "Radio", "Mistlamps", "Sport_Model", "Backseat_Divider", "Metallic_Rim", "Radio_cassette", "Parking_Assistant", "Tow_Bar", "TFC_Airbag_2", "TFC_Boardcomputer", "TFC_CD_Player", "TFC_Central_Lock", "TFC_Powered_Windows", "TFC_Radio", "TFC_Mistlamps", "TFC_Sport_Model", "TFC_Backseat_Divider", "TFC_Metallic_Rim", "TFC_Radio_cassette", "TFC_Parking_Assistant", "TFC_Tow_Bar")

crs$weights <- NULL 8

Page 5: Rattle to R - WordPress.com...Rattle to R R Bohn April 21, 2015 update 2018 1 R is …. • A programming language • (source) code • Interactive language (opposite of compiled)

9

#============================================================ # Regression model

# Build a Regression model.

crs$glm <- lm(Price ~ ., data=crs$dataset[crs$train,c(crs$input, crs$target)])

# Generate a textual view of the Linear model.

print(summary(crs$glm)) #This is the second key line

cat('==== ANOVA ====

') print(anova(crs$glm)) print(" ")

# Time taken: 0.03 secs 10

Page 6: Rattle to R - WordPress.com...Rattle to R R Bohn April 21, 2015 update 2018 1 R is …. • A programming language • (source) code • Interactive language (opposite of compiled)

11

12

Page 7: Rattle to R - WordPress.com...Rattle to R R Bohn April 21, 2015 update 2018 1 R is …. • A programming language • (source) code • Interactive language (opposite of compiled)

R cookbook page on regression

http://proquest.safaribooksonline.com/book/programming/r/9780596809287/11dot-linear-regression-and-anova/id3419297 The lm function returns a model object that you can assign to a variable: > m <- lm(y ~ u + v + w) From the model object, you can extract important information using specialized functions. The most important function is summary: > summary(m)

13

Specifying a model

• crs$glm <- lm(Price ~ .+ KM* Mfg_Year , data=crs$dataset[crs$train,c(crs$input, crs$target)])

• This line runs the model with an interaction term: KM* Mfg_Year

• summary(crs$glm) • This prints the result

• Comment: You see why should start variable names with CAPS or other signifier

columns: Causes, targetRows: training set

14

Page 8: Rattle to R - WordPress.com...Rattle to R R Bohn April 21, 2015 update 2018 1 R is …. • A programming language • (source) code • Interactive language (opposite of compiled)

Learning Objectives week 4; How did we do?

Introduce continuous outcomes aka “regression” Setting up linear models:

Dummy variables for ‘factors’ Modeling nonlinearities: interactions Transformations e.g. square, log of x

Interpreting regression Results: Physical meaning of the coefficients Which variables matter? Importance vs statistical significance Measure overall model performance: Mean Absolute Error

Other differences BDA vs Classical Stats = Hypothesis testing Homework: estimating EPA fuel efficiency

Using Rattle and R together Transforming data 15