16
BIO5312 Biostatistics BIO5312 Biostatistics R Session 11: R Session 11: Multisample Multisample Hypothesis Hypothesis Testing II Testing II Dr. Junchao Xia Center of Biophysics and Computational Biology Fall 2016 11/8/2016 1 /15

BIO5312 Biostatistics R Session 11: Multisample …...R Session 11: Multisample Hypothesis Testing II Dr. Junchao Xia Center of Biophysics and Computational Biology Fall 2016 11/8/2016

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: BIO5312 Biostatistics R Session 11: Multisample …...R Session 11: Multisample Hypothesis Testing II Dr. Junchao Xia Center of Biophysics and Computational Biology Fall 2016 11/8/2016

BIO5312 Biostatistics BIO5312 Biostatistics R Session 11: R Session 11: MultisampleMultisample Hypothesis Hypothesis

Testing II Testing II

Dr. Junchao Xia

Center of Biophysics and Computational Biology

Fall 2016

11/8/2016 1 /15

Page 2: BIO5312 Biostatistics R Session 11: Multisample …...R Session 11: Multisample Hypothesis Testing II Dr. Junchao Xia Center of Biophysics and Computational Biology Fall 2016 11/8/2016

Generating Box Plots for Different GroupsGenerating Box Plots for Different Groups

Loading data and building 3 different groups # set work directory > setwd("C:/Users/Junchao/Desktop/Biostatistics_5312/2016/lab_11") # read data from the data file >lead = read.table("LEAD.DAT.txt",header=T) # remove individuals with missings 99 >ids=lead$maxfwt!=99 # get the maximum number of finger-wrist tapping test >fwt = lead$maxfwt[ids] # obtain a factor using the group IDs >grp = factor(lead$lead_grp[ids]) 11/8/2016 2 /15

Page 3: BIO5312 Biostatistics R Session 11: Multisample …...R Session 11: Multisample Hypothesis Testing II Dr. Junchao Xia Center of Biophysics and Computational Biology Fall 2016 11/8/2016

Box Plots from Different GroupsBox Plots from Different Groups

# generate the boxplots for three different groups

>boxplot(fwt~grp,xlab="Lead group",ylab="fwt")

11/8/2016 3 /15

Page 4: BIO5312 Biostatistics R Session 11: Multisample …...R Session 11: Multisample Hypothesis Testing II Dr. Junchao Xia Center of Biophysics and Computational Biology Fall 2016 11/8/2016

Fit a Multiple Linear ModelFit a Multiple Linear Model # fit a multiple linear model

>fit1=lm(fwt~grp)

>summary(fit1) Call:

lm(formula = fwt ~ grp)

Residuals:

Min 1Q Median 3Q Max

-41.438 - 5.750 0.000 7.531 31.500

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 54.438 1.539 35.370 < 2e-16 ***

Grp2 -10.438 3.217 -3.245 0.00162 **

grp3 -2.938 3.441 -0.854 0.39548

Residual standard error: 12.31 on 96 degrees of freedom

Multiple R-squared: 0.09905, Adjusted R-squared: 0.08028

F-statistic: 5.277 on 2 and 96 DF, p-value: 0.006692

11/8/2016 4 /15

Page 5: BIO5312 Biostatistics R Session 11: Multisample …...R Session 11: Multisample Hypothesis Testing II Dr. Junchao Xia Center of Biophysics and Computational Biology Fall 2016 11/8/2016

Fit a Multiple Linear ModelFit a Multiple Linear Model # residual plots

>par(mfrow=c(2,2))

>plot(fit1)

11/8/2016 5 /15

Page 6: BIO5312 Biostatistics R Session 11: Multisample …...R Session 11: Multisample Hypothesis Testing II Dr. Junchao Xia Center of Biophysics and Computational Biology Fall 2016 11/8/2016

OneOne--Way ANOVAWay ANOVA # get the overall F test

>anova(fit1)

Analysis of Variance Table

Response: fwt

Df Sum Sq Mean Sq F value Pr(>F)

grp 2 1600.1 800.04 5.2773 0.006692 **

Residuals 96 14553.8 151.60

# calculate the variance-covariance matrix for a fitted model

>vcov(fit1)

(Intercept) grp2 grp3

(Intercept) 2.368774 -2.368774 -2.368774

grp2 -2.368774 10.347804 2.368774

grp3 -2.368774 2.368774 11.843872

11/8/2016 6 /15

Page 7: BIO5312 Biostatistics R Session 11: Multisample …...R Session 11: Multisample Hypothesis Testing II Dr. Junchao Xia Center of Biophysics and Computational Biology Fall 2016 11/8/2016

Least Significant Difference (LSD) ProcedureLeast Significant Difference (LSD) Procedure # LSD procedure from R session 07

> xbar <- tapply(fwt, grp, mean, na.rm = TRUE) # group mean

> s <- tapply(fwt, grp, sd, na.rm = TRUE) # group s.d

> n <- tapply(!is.na(fwt), grp, sum) # group sample size

> degf <- n - 1 # d.f. of groups

> total.degf <- sum(degf) # total d.f.

> ## the pooled variance

> pooled.sd <- sqrt(sum(s^2 * degf)/total.degf)

> # for pair i and j

> i=1; j=2

> dif <- xbar[i] - xbar[j]

> se.dif <- pooled.sd * sqrt(1/n[i] + 1/n[j])

> t.val <- dif/se.dif # test statistic

> t.val

3.244684

> 2 * pt(abs(t.val), total.degf, lower.tail=F)

0.001618783

11/8/2016 7 /15

Page 8: BIO5312 Biostatistics R Session 11: Multisample …...R Session 11: Multisample Hypothesis Testing II Dr. Junchao Xia Center of Biophysics and Computational Biology Fall 2016 11/8/2016

Multiple Comparisons of GroupsMultiple Comparisons of Groups > help(pairwise.t.test)

Pairwise t tests

Description

Calculate pairwise comparisons between group levels with corrections for multiple testing

Usage

pairwise.t.test(x, g, p.adjust.method = p.adjust.methods, pool.sd = !paired, paired = FALSE, alternative = c("two.sided", "less", "greater"), ...)

Arguments

x response vector.

g grouping vector or factor.

p.adjust.method Method for adjusting p values (see p.adjust).

pool.sd switch to allow/disallow the use of a pooled SD

paired a logical indicating whether you want paired t-tests.

alternativea character string specifying the alternative hypothesis, must be

one of "two.sided" (default), "greater" or "less". Can be abbreviated.

... additional arguments to pass to t.test.

11/8/2016 8 /15

Page 9: BIO5312 Biostatistics R Session 11: Multisample …...R Session 11: Multisample Hypothesis Testing II Dr. Junchao Xia Center of Biophysics and Computational Biology Fall 2016 11/8/2016

Fisher LSD Test Fisher LSD Test

>pairwise.t.test(fwt,grp,p.adjust.method = "none")

Pairwise comparisons using t tests with pooled SD

data: fwt and grp

1 2

2 0.0016 -

3 0.3955 0.0758

P value adjustment method: none

11/8/2016 9 /15

Page 10: BIO5312 Biostatistics R Session 11: Multisample …...R Session 11: Multisample Hypothesis Testing II Dr. Junchao Xia Center of Biophysics and Computational Biology Fall 2016 11/8/2016

PP--Value Adjustment for Multiple Comparisons Value Adjustment for Multiple Comparisons >help("p.adjust")

Adjust P-values for Multiple Comparisons

Description

Given a set of p-values, returns p-values adjusted using one of several methods.

Usage

p.adjust(p, method = p.adjust.methods, n = length(p)) p.adjust.methods # c("holm", "hochberg", "hommel", "bonferroni", "BH", "BY", # "fdr", "none")

Arguments

p numeric vector of p-values (possibly with NAs). Any other R is coerced

by as.numeric.

method correction method. Can be abbreviated.

n number of comparisons, must be at least length(p); only set this (to

non-default) when you know what you are doing!

11/8/2016 10 /15

Page 11: BIO5312 Biostatistics R Session 11: Multisample …...R Session 11: Multisample Hypothesis Testing II Dr. Junchao Xia Center of Biophysics and Computational Biology Fall 2016 11/8/2016

PP--Value Adjustment for Multiple Comparisons Value Adjustment for Multiple Comparisons

# Bonferroni approach

>pairwise.t.test(fwt,grp,p.adjust.method = "bonferroni")

Pairwise comparisons using t tests with pooled SD

data: fwt and grp

1 2

2 0.0049 -

3 1.0000 0.2273

P value adjustment method: bonferroni

# False discovery rate

>pairwise.t.test(fwt,grp,p.adjust.method = "fdr")

Pairwise comparisons using t tests with pooled SD

data: fwt and grp

1 2

2 0.0049 -

3 0.3955 0.1137

P value adjustment method: fdr

11/8/2016 11 /15

Page 12: BIO5312 Biostatistics R Session 11: Multisample …...R Session 11: Multisample Hypothesis Testing II Dr. Junchao Xia Center of Biophysics and Computational Biology Fall 2016 11/8/2016

TwoTwo--Way ANOVA: No Interaction Effect Way ANOVA: No Interaction Effect # add the sex as another category >sex= factor(lead$sex[ids])

>fit2 = lm(fwt~grp+sex)

>summary(fit2)

Call:

lm(formula = fwt ~ grp + sex)

Residuals:

Min 1Q Median 3Q Max

-40.926 -5.771 -0.139 7.229 32.031

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 53.926 1.881 28.669 < 2e-16 ***

grp2 -10.309 3.241 -3.181 0.00198 **

grp3 -2.956 3.456 -0.856 0.39440

sex2 1.213 2.542 0.477 0.63435

Residual standard error: 12.36 on 95 degrees of freedom

Multiple R-squared: 0.1012, Adjusted R-squared: 0.07282

F-statistic: 3.566 on 3 and 95 DF, p-value: 0.01702

11/8/2016 12 /15

Page 13: BIO5312 Biostatistics R Session 11: Multisample …...R Session 11: Multisample Hypothesis Testing II Dr. Junchao Xia Center of Biophysics and Computational Biology Fall 2016 11/8/2016

TwoTwo--Way ANOVA: No Interaction Effect Way ANOVA: No Interaction Effect # print out two-way ANOVA

>anova(fit2)

Analysis of Variance Table

Response: fwt

Df Sum Sq Mean Sq F value Pr(>F)

grp 2 1600.1 800.04 5.2348 0.006971 **

sex 1 34.8 34.79 0.2277 0.634354

Residuals 95 14519.0 152.83

11/8/2016 13 /15

Page 14: BIO5312 Biostatistics R Session 11: Multisample …...R Session 11: Multisample Hypothesis Testing II Dr. Junchao Xia Center of Biophysics and Computational Biology Fall 2016 11/8/2016

TwoTwo--Way ANOVA: Interaction Effect Way ANOVA: Interaction Effect >fit3 = lm(fwt~grp*sex)

>summary(fit3)

Call:

lm(formula = fwt ~ grp * sex)

Residuals:

Min 1Q Median 3Q Max

-41.270 -6.207 0.333 7.436 29.730

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 54.2703 2.0095 27.006 < 2e-16 ***

grp2 -13.8087 3.9410 -3.504 0.000707 ***

grp3 -0.1592 4.5431 -0.035 0.972128

sex2 0.3964 3.0939 0.128 0.898329

grp2:sex2 10.8087 6.7800 1.594 0.114281

grp3:sex2 -6.3647 6.8934 -0.923 0.358242

Residual standard error: 12.22 on 93 degrees of freedom

Multiple R-squared: 0.1398, Adjusted R-squared: 0.09355

F-statistic: 3.023 on 5 and 93 DF, p-value: 0.01424

11/8/2016 14 /15

Page 15: BIO5312 Biostatistics R Session 11: Multisample …...R Session 11: Multisample Hypothesis Testing II Dr. Junchao Xia Center of Biophysics and Computational Biology Fall 2016 11/8/2016

TwoTwo--Way ANOVA: Interaction Effect Way ANOVA: Interaction Effect # print out two-way ANOVA

>anova(fit3)

Analysis of Variance Table

Response: fwt

Df Sum Sq Mean Sq F value Pr(>F)

grp 2 1600.1 800.04 5.3545 0.006295 **

sex 1 34.8 34.79 0.2329 0.630535

grp:sex 2 623.3 311.67 2.0860 0.129960

Residuals 93 13895.6 149.42

11/8/2016 15 /15

Page 16: BIO5312 Biostatistics R Session 11: Multisample …...R Session 11: Multisample Hypothesis Testing II Dr. Junchao Xia Center of Biophysics and Computational Biology Fall 2016 11/8/2016

The End

11/8/2016 16 /15