15
BIOSTATS 640 Spring 2019 Unit 7 Introduction to Analysis of Variance (1 of 2) Solutions R Users Sol_anova_1 of 2 R Users.docx Page 1 of 15 Unit 7 – Analysis of Variance Practice Problems SOLUTIONS – R Users Before you begin: Download from the course website: R Users anova_infants.Rdata fishgrowth.Rdata Preliminaries: Set current directory. Attach libraries needed setwd("/Users/cbigelow/Desktop/") options(scipen=1000) library(DescTools) # command Freq( ): One way frequencies, relative frequencies, etc library(descr) # command CrossTable( ): Two way contingency table library(doBy) # command summaryBy( ): Descriptives by group library(car) # command leveneTest(): Test of equal variances 2 way anova Practice with one way analysis of variance Exercises #1-6 Data set: anova_infants.Rdata Zelazo et al. (1972) investigated the variability in age at first walking in infants. Study infants were grouped into four groups, according to reinforcement of walking and placement: (1) active (2) passive (3) no exercise; and (4) 8 week control. Sample sizes were 6 per group, for a total of n=24. For each infant, study data included group assignment and age at first walking, in months. The following are the data and consist of recorded values of age (months) by group: Active Group Passive Group No-Exercise Group 8 Week Control 9.00 11.00 11.50 13.25 9.50 10.00 12.00 11.50 9.75 10.00 9.00 12.00 10.00 11.75 11.50 13.50 13.00 10.50 13.25 11.50 9.50 15.00 13.00 12.35 Source: Zelazo et al (1972) “Walking” in the newborn. Science 176: 314-315.

R Users - UMasspeople.umass.edu/biep640w/pdf/sol_anova_1 of 2 R USERS.pdfBIOSTATS 640 Spring 2019 Unit 7 Introduction to Analysis of Variance (1 of 2) Solutions R Users Sol_anova_1

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: R Users - UMasspeople.umass.edu/biep640w/pdf/sol_anova_1 of 2 R USERS.pdfBIOSTATS 640 Spring 2019 Unit 7 Introduction to Analysis of Variance (1 of 2) Solutions R Users Sol_anova_1

BIOSTATS 640 Spring 2019 Unit 7 Introduction to Analysis of Variance (1 of 2) Solutions R Users

Sol_anova_1 of 2 R Users.docx Page 1 of 15

Unit 7 – Analysis of Variance Practice Problems

SOLUTIONS – R Users

Before you begin: Download from the course website: R Users anova_infants.Rdata fishgrowth.Rdata

Preliminaries:          Set  current  directory.      Attach  libraries  needed  

 setwd("/Users/cbigelow/Desktop/")  options(scipen=1000)    library(DescTools)              #  command  Freq(  ):              One  way  frequencies,  relative  frequencies,  etc  library(descr)                      #  command  CrossTable(  ):  Two  way  contingency  table  library(doBy)                        #  command  summaryBy(  ):    Descriptives  by  group  library(car)                          #  command  leveneTest():    Test  of  equal  variances  2  way  anova  

Practice with one way analysis of variance Exercises #1-6 Data set: anova_infants.Rdata

Zelazo et al. (1972) investigated the variability in age at first walking in infants. Study infants were grouped into four groups, according to reinforcement of walking and placement: (1) active (2) passive (3) no exercise; and (4) 8 week control. Sample sizes were 6 per group, for a total of n=24. For each infant, study data included group assignment and age at first walking, in months. The following are the data and consist of recorded values of age (months) by group:

Active Group Passive Group No-Exercise Group 8 Week Control 9.00 11.00 11.50 13.25 9.50 10.00 12.00 11.50 9.75 10.00 9.00 12.00

10.00 11.75 11.50 13.50 13.00 10.50 13.25 11.50 9.50 15.00 13.00 12.35

Source: Zelazo et al (1972) “Walking” in the newborn. Science 176: 314-315.

 

Page 2: R Users - UMasspeople.umass.edu/biep640w/pdf/sol_anova_1 of 2 R USERS.pdfBIOSTATS 640 Spring 2019 Unit 7 Introduction to Analysis of Variance (1 of 2) Solutions R Users Sol_anova_1

BIOSTATS 640 Spring 2019 Unit 7 Introduction to Analysis of Variance (1 of 2) Solutions R Users

Sol_anova_1 of 2 R Users.docx Page 2 of 15

Preliminaries  for  Exercises  1-­‐6:            Read  in  data.      Clean.        Produce  descriptives.  

load(file="anova_infants.Rdata”)  str(dat1)  

##  'data.frame':        24  obs.  of    7  variables:  ##    $  group        :  int    1  1  1  1  1  1  2  2  2  2  ...  ##    $  age            :  num    9  9.5  9.75  10  13  ...  ##    $  passive    :  num    0  0  0  0  0  0  1  1  1  1  ...  ##    $  noex          :  num    0  0  0  0  0  0  0  0  0  0  ...  ##    $  I_active  :  num    1  1  1  1  1  1  0  0  0  0  ...  ##    $  I_passive:  num    0  0  0  0  0  0  1  1  1  1  ...  ##    $  I_noex      :  num    0  0  0  0  0  0  0  0  0  0  ...  ##    -­‐  attr(*,  "datalabel")=  chr  ""  ##    -­‐  attr(*,  "time.stamp")=  chr  "  9  Apr  2017  16:58"  ##    -­‐  attr(*,  "formats")=  chr    "%8.0g"  "%9.0g"  "%9.0g"  "%9.0g"  ...  ##    -­‐  attr(*,  "types")=  int    251  254  254  254  254  254  254  ##    -­‐  attr(*,  "val.labels")=  chr    "groupf"  ""  ""  ""  ...  ##    -­‐  attr(*,  "var.labels")=  chr    ""  ""  ""  ""  ...  ##    -­‐  attr(*,  "version")=  int  12  

dat1$group  <-­‐  factor(dat1$group,                                              levels=c(1,2,3,4),                                            labels=c("1=active",  "2=passive",  "3=noex",  "4=control"))    library(DescTools)  DescTools::Freq(dat1$group,digits=2,dnn=c("Group"))  

##                level    freq      perc    cumfreq    cumperc  ##  1      1=active          6    25.0%                6        25.0%  ##  2    2=passive          6    25.0%              12        50.0%  ##  3          3=noex          6    25.0%              18        75.0%  ##  4    4=control          6    25.0%              24      100.0%  

Page 3: R Users - UMasspeople.umass.edu/biep640w/pdf/sol_anova_1 of 2 R USERS.pdfBIOSTATS 640 Spring 2019 Unit 7 Introduction to Analysis of Variance (1 of 2) Solutions R Users Sol_anova_1

BIOSTATS 640 Spring 2019 Unit 7 Introduction to Analysis of Variance (1 of 2) Solutions R Users

Sol_anova_1 of 2 R Users.docx Page 3 of 15

1. State the analysis of variance model using notation 2iµ and τ and σ as appropriate. Define

all terms and constraints on the parameters

Answer:

∑e

42

ij i ij ij ii=1

i

Y = µ + τ + ε , where ε ~N(0,σ ) and τ =0

i = 1, 2, ... K indexes method of reinforcement group;K = number of groups = 4j=1, 2, ..., n =6 indexes infant within group;µ = population

i

i i

ij

mean age at first walking, over all groupsµ = mean age at first walking for infants in group "i"τ = [ µ - µ ]Y = observed age at first walking for the jth infant in group "i"

O 1 2 3 4

A i

H : =0, =0, =0, and =0H : At least one 0

τ τ τ ττ ≠

2. By any means you like, produce a side by side box plot showing the distribution of age at first walking, separately for each of the 4 groups.  

#  Descriptives  summaryBy(age~group,data=dat1,                FUN=c(length,mean,var,sd,min,max),                fun.names=c("n","Mean",  "Var",  "SD",  "Min",  "Max"))  

##              group  age.n  age.Mean    age.Var        age.SD  age.Min  age.Max  ##  1    1=active          6  10.12500  2.093750  1.4469796          9.0      13.00  ##  2  2=passive          6  11.37500  3.593750  1.8957189        10.0      15.00  ##  3        3=noex          6  11.70833  2.310417  1.5200055          9.0      13.25  ##  4  4=control          6  12.35000  0.740000  0.8602325        11.5      13.50  

 

 

 

 

 

 

 

 

Page 4: R Users - UMasspeople.umass.edu/biep640w/pdf/sol_anova_1 of 2 R USERS.pdfBIOSTATS 640 Spring 2019 Unit 7 Introduction to Analysis of Variance (1 of 2) Solutions R Users Sol_anova_1

BIOSTATS 640 Spring 2019 Unit 7 Introduction to Analysis of Variance (1 of 2) Solutions R Users

Sol_anova_1 of 2 R Users.docx Page 4 of 15

#  Side-­‐by-­‐side  box  plot  boxplot(age~group,data=dat1,                  main="Age  (Month)  at  First  Walking",                  xlab="Group",                  ylab="Month")  

 

Exercise #3 - In these data, first walking occurs earlier when infants are reinforced - Distributions differ markedly with respect to variability with greatest seen among infants in the passive group and smallest among infants in the control group - Distributions also differ markedly in their patterns of symmetry with long right tails in the active and passive groups, long left tail in the no-exercise group, and symmetry in controls

- Plot confirms impressions from the descriptive statistics.

Page 5: R Users - UMasspeople.umass.edu/biep640w/pdf/sol_anova_1 of 2 R USERS.pdfBIOSTATS 640 Spring 2019 Unit 7 Introduction to Analysis of Variance (1 of 2) Solutions R Users Sol_anova_1

BIOSTATS 640 Spring 2019 Unit 7 Introduction to Analysis of Variance (1 of 2) Solutions R Users

Sol_anova_1 of 2 R Users.docx Page 5 of 15

3. By any means you like, obtain the entries of the analysis of variance table for this one way analysis of variance. Use your computer output (or excel work or hand calculations or whatever) to complete the following table: Source

df

Sum of Squares SSQ

Mean Square MSQ

F-Statistic

p-value

Between Groups

3

15.74

5.25

2.40

.10

Within Groups

20

43.69

2.18

Total, corrected 23 59.43  

q3anova  <-­‐  aov(age~group,  data=dat1)  anova(q3anova)  

##  Analysis  of  Variance  Table  ##    ##  Response:  age  ##                      Df  Sum  Sq  Mean  Sq  F  value    Pr(>F)      ##  group            3    15.74    5.2468    2.4018  0.09787  .  ##  Residuals  20    43.69    2.1845                                      ##  -­‐-­‐-­‐  ##  Signif.  codes:    0  '***'  0.001  '**'  0.01  '*'  0.05  '.'  0.1  '  '  1  

bartlett.test(age  ~  group,  data=dat1)  

##    ##    Bartlett  test  of  homogeneity  of  variances  ##    ##  data:    age  by  group  ##  Bartlett's  K-­‐squared  =  2.6355,  df  =  3,  p-­‐value  =  0.4513  

 

4. Write a 2-5 sentence report of your description and hypothesis test findings using language as appropriate for a client who is intelligent but is not knowledgeable about statistics. Include figure and table as you think is appropriate.

In this sample, the data suggest a trend towards earlier age at first walking with increasing reinforcement and placement. The median age at first walking is greatest among controls (12.35 months) and lowest among infants in the “active” group (10.13 months); see also the box plots. Tests of statistical significance were limited to the overall F test for group differences and this did not achieve statistical significance (p-value = .10), possibly due to the small sample sizes (6 in each group). Interestingly, examination of the data also suggests that the variability in age at first walking differed, depending on the intervention received. The variability was greater in the three intervention groups (“active”, “passive”, “no exercise”) compared to in the “control” group; this was not statistically significant however (p-value = .45). Further study, utilizing larger sample sizes and additional hypothesis tests to investigate trend are needed.

Page 6: R Users - UMasspeople.umass.edu/biep640w/pdf/sol_anova_1 of 2 R USERS.pdfBIOSTATS 640 Spring 2019 Unit 7 Introduction to Analysis of Variance (1 of 2) Solutions R Users Sol_anova_1

BIOSTATS 640 Spring 2019 Unit 7 Introduction to Analysis of Variance (1 of 2) Solutions R Users

Sol_anova_1 of 2 R Users.docx Page 6 of 15

5. For the brave Using appropriately defined indicator variables, perform a multivariable linear regression analysis of these same data! Use your computer output to complete the following table: Source df Sum of Squares Mean Square Overall F due model (p) = 3 ( )2

1

ˆn

ii

SSR Y Y=

= −∑ = 15.74 SSR/p = 5.25

2.40

due error (residual)

(n-1-p) = 20 ( )21

ˆn

i ii

SSE Y Y=

= −∑ = 43.69 SSE/(n-1-p) =2.18

Total, corrected (n-1) = 23 ( )21

n

ii

SST Y Y=

= −∑ = 59.43

Some of this has already been done for you: I considered the following parameterization Y = age at first walking I_act = 0/1 indicator of group assignment to “active” I_pass = 0/1 indicator of group assignment to “passive” I_noex = 0/1 indicator of group assignment to “no exercise” Thus, I used a reference cell coding approach with “8 week control” as my reference. I fit the following multivariable linear model of Y Y = β0 + β1 [I_act] + β2 [I_pass] + β3 [I_noex] + error

 

dat1$I_active  <-­‐  factor(dat1$I_active)  dat1$I_passive  <-­‐  factor(dat1$I_passive)  dat1$I_noex  <-­‐  factor(dat1$I_noex)  q5model  <-­‐  lm(age  ~  I_active  +  I_passive  +  I_noex,  data=dat1)  anova(q5model)  

##  Analysis  of  Variance  Table  ##    ##  Response:  age  ##                      Df  Sum  Sq  Mean  Sq  F  value    Pr(>F)      ##  I_active      1  12.793  12.7934    5.8565  0.02516  *  ##  I_passive    1    1.712    1.7117    0.7836  0.38656      ##  I_noex          1    1.235    1.2352    0.5654  0.46083      ##  Residuals  20  43.690    2.1845                                      ##  -­‐-­‐-­‐  ##  Signif.  codes:    0  '***'  0.001  '**'  0.01  '*'  0.05  '.'  0.1  '  '  1  

 

 

Page 7: R Users - UMasspeople.umass.edu/biep640w/pdf/sol_anova_1 of 2 R USERS.pdfBIOSTATS 640 Spring 2019 Unit 7 Introduction to Analysis of Variance (1 of 2) Solutions R Users Sol_anova_1

BIOSTATS 640 Spring 2019 Unit 7 Introduction to Analysis of Variance (1 of 2) Solutions R Users

Sol_anova_1 of 2 R Users.docx Page 7 of 15

 

summary(q5model)  

##    ##  Call:  ##  lm(formula  =  age  ~  I_active  +  I_passive  +  I_noex,  data  =  dat1)  ##    ##  Residuals:  ##          Min            1Q    Median            3Q          Max    ##  -­‐2.7083  -­‐0.8500  -­‐0.2792    0.5062    3.6250    ##    ##  Coefficients:  ##                          Estimate  Std.  Error  t  value  Pr(>|t|)          ##  (Intercept)    12.3500          0.6034    20.468  6.94e-­‐15  ***  ##  I_active1        -­‐2.2250          0.8533    -­‐2.607      0.0169  *      ##  I_passive1      -­‐0.9750          0.8533    -­‐1.143      0.2667          ##  I_noex1            -­‐0.6417          0.8533    -­‐0.752      0.4608          ##  -­‐-­‐-­‐  ##  Signif.  codes:    0  '***'  0.001  '**'  0.01  '*'  0.05  '.'  0.1  '  '  1  ##    ##  Residual  standard  error:  1.478  on  20  degrees  of  freedom  ##  Multiple  R-­‐squared:    0.2649,  Adjusted  R-­‐squared:    0.1546    ##  F-­‐statistic:  2.402  on  3  and  20  DF,    p-­‐value:  0.09787    

The prediction equation is thus:

Y = 12.35 - 2.225*I_active - 0.97*I_passive - 0.64*I_noex

The two analyses match (hooray), thus confirming that a multiple linear regression model utilizing appropriately defined indicator variables is equivalent to an analysis of variance.

6. For the brave: Using your output from your two analyses (1st-analysis of variance, 2nd – regression), obtain the predicted mean of Y =age at first walking twice in two ways.

Prediction Using One Way Analysis of Variance

Prediction Using Multiple Linear Regression

Active 10.125 1 0 1

ˆ ˆµ = (β +β ) = 12.35 - 2.225 10 = .125 Passive 11.375

2 0 2ˆ ˆµ = (β +β ) = 12.35 - 0.97 = 11.38

No-Exercise 11.71 3 0 3

ˆ ˆµ = (β +β ) = 12.35 - 0.64 = 11.71 Control 12.35

4 0ˆµ = β = 12.35

Page 8: R Users - UMasspeople.umass.edu/biep640w/pdf/sol_anova_1 of 2 R USERS.pdfBIOSTATS 640 Spring 2019 Unit 7 Introduction to Analysis of Variance (1 of 2) Solutions R Users Sol_anova_1

BIOSTATS 640 Spring 2019 Unit 7 Introduction to Analysis of Variance (1 of 2) Solutions R Users

Sol_anova_1 of 2 R Users.docx Page 8 of 15

Question  #6  For  the  brave  -­‐  Obtain  Predicted  Values:  a)  anova  b)  regression  

 #  Predicted  Means  from  ANOVA  aactive  <-­‐  predict(q3anova,data.frame(group="1=active"))  apassive  <-­‐  predict(q3anova,data.frame(group="2=passive"))  anoex  <-­‐  predict(q3anova,data.frame(group="3=noex"))  acontrol  <-­‐  predict(q3anova,data.frame(group="4=control"))    anames  <-­‐  c("Active",  "Passive",  "NoEx",  "Control")  ahat  <-­‐  c(aactive,apassive,anoex,acontrol)  means.anova  <-­‐  data.frame(anames,ahat)  means.anova  

##        anames          ahat  ##  1    Active  10.12500  ##  2  Passive  11.37500  ##  3        NoEx  11.70833  ##  4  Control  12.35000  

#  Predicted  Means  from  Regression  active  <-­‐  predict(q5model,data.frame(I_active="1",I_passive="0",I_noex="0"))  passive  <-­‐  predict(q5model,data.frame(I_active="0",I_passive="1",I_noex="0"))  noex  <-­‐  predict(q5model,data.frame(I_active="0",I_passive="0",I_noex="1"))  control  <-­‐  predict(q5model,data.frame(I_active="0",I_passive="0",I_noex="0"))    names  <-­‐  c("Active",  "Passive",  "NoEx",  "Control")  yhat  <-­‐  c(active,passive,noex,control)  means.regression  <-­‐  data.frame(names,yhat)  means.regression  

##          names          yhat  ##  1    Active  10.12500  ##  2  Passive  11.37500  ##  3        NoEx  11.70833  ##  4  Control  12.35000    

Page 9: R Users - UMasspeople.umass.edu/biep640w/pdf/sol_anova_1 of 2 R USERS.pdfBIOSTATS 640 Spring 2019 Unit 7 Introduction to Analysis of Variance (1 of 2) Solutions R Users Sol_anova_1

BIOSTATS 640 Spring 2019 Unit 7 Introduction to Analysis of Variance (1 of 2) Solutions R Users

Sol_anova_1 of 2 R Users.docx Page 9 of 15

Practice with two-way factorial analysis of variance Exercises #7-12 Data set used: fishgrowth.dta Consider again the fish growth data on page 36 of Notes 7. Introduction to Analysis of Variance.

Light (light)

Water Temp (temp)

Fish Growth (growth)

1=low 1=cold 4.55 1=low 1=cold 4.24 1=low 2=lukewarm 4.89 1=low 2=lukewarm 4.88 1=low 3=warm 5.01 1=low 3=warm 5.11 2=high 1=cold 5.55 2=high 1=cold 4.08 2=high 2=lukewarm 6.09 2=high 2=lukewarm 5.01 2=high 3=warm 7.01 2=high 3=warm 6.92

Coding Manual:

Variable Coding growth continuous light 1 = low

2 = high temp 1 = cold

2 = lukewarm 3 = warm

Page 10: R Users - UMasspeople.umass.edu/biep640w/pdf/sol_anova_1 of 2 R USERS.pdfBIOSTATS 640 Spring 2019 Unit 7 Introduction to Analysis of Variance (1 of 2) Solutions R Users Sol_anova_1

BIOSTATS 640 Spring 2019 Unit 7 Introduction to Analysis of Variance (1 of 2) Solutions R Users

Sol_anova_1 of 2 R Users.docx Page 10 of 15

Preliminaries  for  Exercises  7-­‐12:        Read  in  data.    Clean.      Descriptives  

 load(file="fishgrowth.Rdata”)  str(fishgrowth)  

##  'data.frame':        12  obs.  of    3  variables:  ##    $  light  :  Factor  w/  2  levels  "1=low","2=high":  1  1  1  1  1  1  2  2  2  2  ...  ##    $  temp    :  Factor  w/  3  levels  "1=cold","2=lukewarm",..:  1  1  2  2  3  3  1  1  2  2  ...  ##    $  growth:  num    4.55  4.24  4.89  4.88  5.01  ...  ##    -­‐  attr(*,  "datalabel")=  chr  ""  ##    -­‐  attr(*,  "time.stamp")=  chr  "  9  Apr  2017  16:59"  ##    -­‐  attr(*,  "formats")=  chr    "%8.0g"  "%10.0g"  "%9.0g"  ##    -­‐  attr(*,  "types")=  int    251  251  254  ##    -­‐  attr(*,  "val.labels")=  chr    "lightf"  "tempf"  ""  ##    -­‐  attr(*,  "var.labels")=  chr    ""  ""  ""  ##    -­‐  attr(*,  "version")=  int  12  ##    -­‐  attr(*,  "label.table")=List  of  2  ##      ..$  lightf:  Named  int    1  2  ##      ..  ..-­‐  attr(*,  "names")=  chr    "1=low"  "2=high"  ##      ..$  tempf  :  Named  int    1  2  3  ##      ..  ..-­‐  attr(*,  "names")=  chr    "1=cold"  "2=lukewarm"  "3=warm"  

#Crosstab  of  factors  I  and  II  CrossTable(fishgrowth$light,fishgrowth$temp,dnn=c("Light","Temp"),  format=c("SAS"),prop.r=FALSE,  prop.c=FALSE,prop.t=FALSE,  prop.chisq=FALSE)  

##        Cell  Contents    ##  |-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐|  ##  |                                              N  |    ##  |-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐|  ##    ##  ==============================================  ##                      Temp  ##  Light          1=cold      2=lukewarm      3=warm      Total  ##  -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐  ##  1=low                    2                        2                2              6  ##  -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐  ##  2=high                  2                        2                2              6  ##  -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐  ##  Total                    4                        4                4            12  ##  ==============================================  

 

 

 

 

 

 

 

 

 

 

Page 11: R Users - UMasspeople.umass.edu/biep640w/pdf/sol_anova_1 of 2 R USERS.pdfBIOSTATS 640 Spring 2019 Unit 7 Introduction to Analysis of Variance (1 of 2) Solutions R Users Sol_anova_1

BIOSTATS 640 Spring 2019 Unit 7 Introduction to Analysis of Variance (1 of 2) Solutions R Users

Sol_anova_1 of 2 R Users.docx Page 11 of 15

#  Descriptives  on  y=growth  in  all  4  groups  summaryBy(growth  ~  light  +  temp,  data  =  fishgrowth,      FUN=c(length,mean,var,sd,min,max),                fun.names=c("n","Mean",  "Var",  "SD",  "Min",  "Max"))  

##        light              temp  growth.n  growth.Mean      growth.Var      growth.SD  ##  1    1=low          1=cold                2              4.395  4.805013e-­‐02  0.219203399  ##  2    1=low  2=lukewarm                2              4.885  4.999752e-­‐05  0.007070892  ##  3    1=low          3=warm                2              5.060  4.999990e-­‐03  0.070710611  ##  4  2=high          1=cold                2              4.815  1.080450e+00  1.039447157  ##  5  2=high  2=lukewarm                2              5.550  5.831999e-­‐01  0.763675270  ##  6  2=high          3=warm                2              6.965  4.050014e-­‐03  0.063639718  ##      growth.Min  growth.Max  ##  1              4.24              4.55  ##  2              4.88              4.89  ##  3              5.01              5.11  ##  4              4.08              5.55  ##  5              5.01              6.09  ##  6              6.92              7.01  

 

7. State the analysis of variance model using notation 2

i j ijµ, α , β , (αβ) and σ as appropriate. Define all terms and constraints on the parameters. Answer:

( ) 2ijk i j ijk ijkij

Y = µ + α + β + αβ + ε , where ε ~N(0,σ )

with i = 1, 2 indexing light j = 1, 2, 3 indexing temperature k = 1, 2 indexing individual fish under light "i" at water tempera

( )

2

i ii=1

3

j jj=1

ij

ture "j"µ = population mean fish growth, over all groups

α = effect of light level "i", with α =0

β = effect of water temperature "j", with β =0

αβ = interaction effect of the combination of

∑ ∑2 3

ij iji=1 j=1

"ith" light level and "jth" water temperature

with (αβ) = 0 and (αβ) = 0.

Page 12: R Users - UMasspeople.umass.edu/biep640w/pdf/sol_anova_1 of 2 R USERS.pdfBIOSTATS 640 Spring 2019 Unit 7 Introduction to Analysis of Variance (1 of 2) Solutions R Users Sol_anova_1

BIOSTATS 640 Spring 2019 Unit 7 Introduction to Analysis of Variance (1 of 2) Solutions R Users

Sol_anova_1 of 2 R Users.docx Page 12 of 15

8. Create the following new variables with accompanying definitions

Variable Coding i_high = 1 if (light is 2)

0 otherwise i_luke = 1 if (temp is 2)

0 otherwise i_warm = 1 if (temp is 3)

0 otherwise hi_luke = (i_high) * (i_luke) hi_warm = (i_high)*(i_warm)

 

fishgrowth$i_high  <-­‐  ifelse(fishgrowth$light=="2=high",1,0)  fishgrowth$i_luke  <-­‐  ifelse(fishgrowth$temp=="2=lukewarm",1,0)  fishgrowth$i_warm  <-­‐  ifelse(fishgrowth$temp=="3=warm",1,0)  fishgrowth$hi_luke  <-­‐  fishgrowth$i_high*fishgrowth$i_luke  fishgrowth$hi_warm  <-­‐  fishgrowth$i_high*fishgrowth$i_warm  

 

9. Perform a two way analysis of variance so as to reproduce the following table

Source Df SSQ MSQ F p-value Due LIGHT 1 2.98 2.98 10.39 .018 Due TEMP 2 3.984 1.992 6.95 .027 Due Interaction 3 1.268 0.634 2.21 .191 Error 6 1.721 0.287 Total (Corrected) 11 9.953 -

 

fishgrowth$light  <-­‐  as.factor(fishgrowth$light)  fishgrowth$temp  <-­‐  as.factor(fishgrowth$temp)  q9anova  <-­‐  aov(growth  ~  light  +  temp  +  light:temp,  data=fishgrowth)  anova(q9anova)  

##  Analysis  of  Variance  Table  ##    ##  Response:  growth  ##                        Df  Sum  Sq  Mean  Sq  F  value    Pr(>F)      ##  light              1  2.9800  2.98003  10.3906  0.01806  *  ##  temp                2  3.9843  1.99216    6.9462  0.02744  *  ##  light:temp    2  1.2676  0.63381    2.2099  0.19093      ##  Residuals      6  1.7208  0.28680                                      ##  -­‐-­‐-­‐  ##  Signif.  codes:    0  '***'  0.001  '**'  0.01  '*'  0.05  '.'  0.1  '  '  1  

 

 

Page 13: R Users - UMasspeople.umass.edu/biep640w/pdf/sol_anova_1 of 2 R USERS.pdfBIOSTATS 640 Spring 2019 Unit 7 Introduction to Analysis of Variance (1 of 2) Solutions R Users Sol_anova_1

BIOSTATS 640 Spring 2019 Unit 7 Introduction to Analysis of Variance (1 of 2) Solutions R Users

Sol_anova_1 of 2 R Users.docx Page 13 of 15

leveneTest(growth  ~  light*temp,  data=fishgrowth)  

##  Warning  in  anova.lm(lm(resp  ~  group)):  ANOVA  F-­‐tests  on  an  essentially  ##  perfect  fit  are  unreliable  

##  Levene's  Test  for  Homogeneity  of  Variance  (center  =  median)  ##              Df        F  value        Pr(>F)          ##  group    5  2.7375e+31  <  2.2e-­‐16  ***  ##                6                                                    ##  -­‐-­‐-­‐  ##  Signif.  codes:    0  '***'  0.001  '**'  0.01  '*'  0.05  '.'  0.1  '  '  1  

 

10 Perform a regression analysis to obtain the following table and estimates

Source Df SSQ MSQ F p-value Due Model 5 8.2320 1.6464 5.74 .0276 Due Residual 6 1.7208 0.2868 - Total (Corrected) 11 9.953 -

Predictor beta Se(beta) T=beta/se p-value I_high 0.42 0.5355 0.78 0.46 I_luke 0.49 0.5355 0.91 0.395 I_warm 0.665 0.5355 1.24 0.261 Hi_luke 0.245 0.7574 0.32 0.757 Hi_warm 1.485 0.7574 1.96 0.098 Intercept 4.395 0.3787 11.61 0.000

 

q10model  <-­‐  lm(growth  ~  i_high  +  i_luke  +  i_warm  +  hi_luke  +  hi_warm,  data=fishgrowth)  anova(q10model)  

##  Analysis  of  Variance  Table  ##    ##  Response:  growth  ##                      Df  Sum  Sq  Mean  Sq  F  value      Pr(>F)        ##  i_high          1  2.9800    2.9800  10.3906  0.018059  *    ##  i_luke          1  0.0222    0.0222    0.0774  0.790164        ##  i_warm          1  3.9621    3.9621  13.8149  0.009889  **  ##  hi_luke        1  0.1650    0.1650    0.5753  0.476877        ##  hi_warm        1  1.1026    1.1026    3.8445  0.097594  .    ##  Residuals    6  1.7208    0.2868                                          ##  -­‐-­‐-­‐  ##  Signif.  codes:    0  '***'  0.001  '**'  0.01  '*'  0.05  '.'  0.1  '  '  1  

 

 

 

 

Page 14: R Users - UMasspeople.umass.edu/biep640w/pdf/sol_anova_1 of 2 R USERS.pdfBIOSTATS 640 Spring 2019 Unit 7 Introduction to Analysis of Variance (1 of 2) Solutions R Users Sol_anova_1

BIOSTATS 640 Spring 2019 Unit 7 Introduction to Analysis of Variance (1 of 2) Solutions R Users

Sol_anova_1 of 2 R Users.docx Page 14 of 15

summary(q10model)  

##    ##  Call:  ##  lm(formula  =  growth  ~  i_high  +  i_luke  +  i_warm  +  hi_luke  +  hi_warm,    ##          data  =  dat2)  ##    ##  Residuals:  ##            Min              1Q      Median              3Q            Max    ##  -­‐0.73500  -­‐0.07625    0.00000    0.07625    0.73500    ##    ##  Coefficients:  ##                          Estimate  Std.  Error  t  value  Pr(>|t|)          ##  (Intercept)      4.3950          0.3787    11.606  2.46e-­‐05  ***  ##  i_high                0.4200          0.5355      0.784      0.4627          ##  i_luke                0.4900          0.5355      0.915      0.3955          ##  i_warm                0.6650          0.5355      1.242      0.2607          ##  hi_luke              0.2450          0.7574      0.323      0.7573          ##  hi_warm              1.4850          0.7574      1.961      0.0976  .      ##  -­‐-­‐-­‐  ##  Signif.  codes:    0  '***'  0.001  '**'  0.01  '*'  0.05  '.'  0.1  '  '  1  ##    ##  Residual  standard  error:  0.5355  on  6  degrees  of  freedom  ##  Multiple  R-­‐squared:    0.8271,  Adjusted  R-­‐squared:    0.683    ##  F-­‐statistic:  5.741  on  5  and  6  DF,    p-­‐value:  0.02755  

 

11. Using the output from each of your anova and regression analyses, complete the following tables and notice that they are the same.  

#  Predicted  Means  from  ANOVA  ameans  <-­‐  data.frame(fishgrowth$temp,fishgrowth$light,  fitted=predict(q9anova))  ameans  

##          dat2.temp  dat2.light  fitted  ##  1            1=cold            1=low    4.395  ##  2            1=cold            1=low    4.395  ##  3    2=lukewarm            1=low    4.885  ##  4    2=lukewarm            1=low    4.885  ##  5            3=warm            1=low    5.060  ##  6            3=warm            1=low    5.060  ##  7            1=cold          2=high    4.815  ##  8            1=cold          2=high    4.815  ##  9    2=lukewarm          2=high    5.550  ##  10  2=lukewarm          2=high    5.550  ##  11          3=warm          2=high    6.965  ##  12          3=warm          2=high    6.965  

 

Page 15: R Users - UMasspeople.umass.edu/biep640w/pdf/sol_anova_1 of 2 R USERS.pdfBIOSTATS 640 Spring 2019 Unit 7 Introduction to Analysis of Variance (1 of 2) Solutions R Users Sol_anova_1

BIOSTATS 640 Spring 2019 Unit 7 Introduction to Analysis of Variance (1 of 2) Solutions R Users

Sol_anova_1 of 2 R Users.docx Page 15 of 15

Estimated Mean Growth, by Conditions of Light and Temperature – Anova Analysis Cold Lukewarm Warm Low light 4.395 4.885 5.06 High light 4.815 5.55 6.965

 

#  Predicted  Means  from  Regression  rmeans  <-­‐  data.frame(fishgrowth$temp,fishgrowth$light,  fitted=predict(q10model))  rmeans  

##          dat2.temp  dat2.light  fitted  ##  1            1=cold            1=low    4.395  ##  2            1=cold            1=low    4.395  ##  3    2=lukewarm            1=low    4.885  ##  4    2=lukewarm            1=low    4.885  ##  5            3=warm            1=low    5.060  ##  6            3=warm            1=low    5.060  ##  7            1=cold          2=high    4.815  ##  8            1=cold          2=high    4.815  ##  9    2=lukewarm          2=high    5.550  ##  10  2=lukewarm          2=high    5.550  ##  11          3=warm          2=high    6.965  ##  12          3=warm          2=high    6.965  

 

Estimated Mean Growth, by Conditions of Light and Temperature – Regression Analysis

Cold Lukewarm Warm Low light 4.395 4.885 5.06 High light 4.815 5.55 6.965

12. Compare your answer to #11 with the observed means. Observed Mean Growth, by Conditions of Light and Temperature

Cold Lukewarm Warm Low light 4.395 4.885 5.06 High light 4.815 5.55 6.965

They match!