35
What factors are most responsible for height? = (Model) +

What factors are most responsible for height? Outcome = (Model) + Error

Embed Size (px)

Citation preview

Page 1: What factors are most responsible for height? Outcome = (Model) + Error

What factors are most responsible for height?

Outcome = (Model) + Error

Page 2: What factors are most responsible for height? Outcome = (Model) + Error

Analytics & History: 1st Regression Line

The first “Regression Line”

Page 3: What factors are most responsible for height? Outcome = (Model) + Error

Galton’s Notebook on Families & Height

Page 4: What factors are most responsible for height? Outcome = (Model) + Error

X1 X2 X3 Y

Galton’s Family Height Dataset

Page 5: What factors are most responsible for height? Outcome = (Model) + Error

> getwd()[1] "C:/Users/johnp_000/Documents"

> setwd()

Page 6: What factors are most responsible for height? Outcome = (Model) + Error

Dataset Input

Function FilenameObject

h <- read.csv("GaltonFamilies.csv")

Page 7: What factors are most responsible for height? Outcome = (Model) + Error

str() summary()

Data Types: Numbers and Factors/Categorical

Page 8: What factors are most responsible for height? Outcome = (Model) + Error

Outline

• One Variable: Univariate• Dependent / Outcome Variable

• Two Variables: Bivariate• Outcome and each Predictor

• All Four Variables: Multivariate

Page 9: What factors are most responsible for height? Outcome = (Model) + Error

Steps

Continuous

Categorical

Histogram

Scatter

Boxplot

Child’s Height

LinearRegression

Dad’s Height

Gender

ContinuousY

X1, X2

X3

TypeVariable

Mom’s Height

Page 10: What factors are most responsible for height? Outcome = (Model) + Error

Frequency Distribution, Histogram

hist(h$child)

Page 11: What factors are most responsible for height? Outcome = (Model) + Error

Area = 1

Density Plot

plot(density(h$childHeight))

Page 12: What factors are most responsible for height? Outcome = (Model) + Error

hist(h$childHeight,freq=F, breaks =25, ylim = c(0,0.14))curve(dnorm(x, mean=mean(h$childHeight), sd=sd(h$childHeight)), col="red", add=T)

Mode, Bimodal

Page 13: What factors are most responsible for height? Outcome = (Model) + Error

Industry Pct.Research 24%Higher Education 7%Information Technology 9%Computer Software 7%Financial Services 6%Banking 2%Pharmaceuticals 4%Biotechnology 4%Market Research 3%Management Consulting 3%Total 69%

Hadley Wickham

Asst. Professor of Statistics at Rice University

ggplot2plyrreshaperggobiprofr

Industries / Organizations Creating and Using R

http://ggplot2.org/

Page 14: What factors are most responsible for height? Outcome = (Model) + Error

ggplot2library(ggplot2)h.gg <- ggplot(h, aes(child)) h.gg + geom_histogram(binwidth = 1 ) + labs(x = "Height", y = "Frequency")h.gg + geom_density()

Page 15: What factors are most responsible for height? Outcome = (Model) + Error

ggplot2h.gg <- ggplot(h, aes(child)) + theme(legend.position = "right")h.gg + geom_density() + labs(x = "Height", y = "Frequency")h.gg + geom_density(aes(fill=factor(gender)), size=2)

Page 16: What factors are most responsible for height? Outcome = (Model) + Error

Steps

Continuous

Categorical

Histogram

Scatter

Boxplot

Child’s Height

LinearRegression

Dad’s Height

Gender

ContinuousY

X1, X2

X3

TypeVariable

Mom’s Height

Page 17: What factors are most responsible for height? Outcome = (Model) + Error

Correlation and Regression

Page 18: What factors are most responsible for height? Outcome = (Model) + Error
Page 19: What factors are most responsible for height? Outcome = (Model) + Error

1. Calculate the difference between the mean and each person’s score for the first variable (x).

2. Calculate the difference between the mean and their value for the second variable (y).

3. Multiply these “error” values.4. Add these values to get the cross product deviations.5. The covariance is the average of cross-product deviations

Covariance

1cov( , ) i ix x y y

Nx y

Page 20: What factors are most responsible for height? Outcome = (Model) + Error

1cov( , ) i ix x y y

Nx y

Covariance

Y

X

Persons 2,3, and 5 look to have similar magnitudes from their means

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3

-4-3-2-1012345

Page 21: What factors are most responsible for height? Outcome = (Model) + Error

254417

441021418221

4)4)(62()2)(60()1)(41()2)(41()3)(40(

1))((

)cov(

.

.....

.....N

yyxxy,x ii

Covariance

• Calculate the error [deviation] between the mean and each subject’s score for the first variable (x).

• Calculate the error [deviation] between the mean and their score for the second variable (y).

• Multiply these error values.• Add these values and you get the cross product deviations.• The covariance is the average cross-product deviations:

Page 22: What factors are most responsible for height? Outcome = (Model) + Error

• Covariance depends upon the units of measurement• Normalize the data• Divide by the standard deviations of both variables.

• The standardized version of covariance is known as the correlation coefficient

Standardizing the Covariance

Page 23: What factors are most responsible for height? Outcome = (Model) + Error

Correlation

?cor

cor(h$father, h$child)

0.2660385

Page 24: What factors are most responsible for height? Outcome = (Model) + Error

Scatterplot Matrix: pairs()

Page 25: What factors are most responsible for height? Outcome = (Model) + Error

Correlations Matrix library(car) scatterplotMatrix(heights)

Page 26: What factors are most responsible for height? Outcome = (Model) + Error

ggplot2

Page 27: What factors are most responsible for height? Outcome = (Model) + Error

Steps

Continuous

Categorical

Histogram

Scatter

Boxplot

Child’s Height

LinearRegression

Dad’s Height

Gender

ContinuousY

X1, X2

X3

TypeVariable

Mom’s Height

Page 28: What factors are most responsible for height? Outcome = (Model) + Error

Box Plot

Page 29: What factors are most responsible for height? Outcome = (Model) + Error

Children’s Height vs. Genderboxplot(h$child~gender,data=h, col=(c("pink","lightblue")), main="Children's Height by Gender", xlab="Gender", ylab="")

Page 30: What factors are most responsible for height? Outcome = (Model) + Error

Descriptive Stats: Box Plot

69.23

64.10

5.13 ======

Page 31: What factors are most responsible for height? Outcome = (Model) + Error

Subset Malesmen<- subset(h, gender=='male')

Page 32: What factors are most responsible for height? Outcome = (Model) + Error

Subset Femaleswomen <- subset(h, gender==‘female')

Page 33: What factors are most responsible for height? Outcome = (Model) + Error

Children’s Height: Males

qqnorm(men$childHeight)qqline(men$childHeight)

hist(men$childHeight)

Page 34: What factors are most responsible for height? Outcome = (Model) + Error

Children’s Height: Females

qqnorm(women$child)qqline(women$child)

hist(women$child)

Page 35: What factors are most responsible for height? Outcome = (Model) + Error

ggplot2 library(ggplot2)h.bb <- ggplot(h, aes(factor(gender), child))h.bb + geom_boxplot()h.bb + geom_boxplot(aes(fill = factor(gender)))