6
Statistical Inference Course Project - Part II - Basic Inferential Data Analysis Francisco Nazar A. March 14, 2015 Load and explore the ToothGrowth data library(datasets) data(ToothGrowth) There are 60 observations, each observation containing the variables len (length), supp (supplement) and dose. There are two types of supplements (VC, OJ) and three types of doses (0.5, 1.0, 2.0). This can be viewed using the names() command, the dim() command, and the unique() command. (Code not shown for saving space) Basic data analysis. The data can be viewed conviniently with a boxplot : library(ggplot2) ggplot(ToothGrowth, aes(x=supp, y=len)) + ggtitle("ToothGrowth Length vs Supplement typeby Dose")+ geom_boxplot(aes(fill=factor(supp))) + geom_jitter() + facet_grid(.~dose) 0.5 1 2 10 20 30 OJ VC OJ VC OJ VC supp len factor(supp) OJ VC ToothGrowth Length vs Supplement typeby Dose 1

Stat Inference Cpp 2

  • Upload
    f

  • View
    214

  • Download
    1

Embed Size (px)

DESCRIPTION

Rein

Citation preview

  • Statistical Inference Course Project - Part II - BasicInferential Data Analysis

    Francisco Nazar A.March 14, 2015

    Load and explore the ToothGrowth data

    library(datasets)data(ToothGrowth)

    There are 60 observations, each observation containing the variables len (length), supp (supplement) anddose. There are two types of supplements (VC, OJ) and three types of doses (0.5, 1.0, 2.0). This can beviewed using the names() command, the dim() command, and the unique() command. (Code not shown forsaving space)

    Basic data analysis.

    The data can be viewed conviniently with a boxplot :

    library(ggplot2)ggplot(ToothGrowth, aes(x=supp, y=len)) +ggtitle("ToothGrowth Length vs Supplement typeby Dose") +geom_boxplot(aes(fill=factor(supp))) + geom_jitter() + facet_grid(.~dose)

    0.5 1 2

    10

    20

    30

    OJ VC OJ VC OJ VCsupp

    len

    factor(supp)OJVC

    ToothGrowth Length vs Supplement typeby Dose

    1

  • In particular, for 0.5 mg doses and Vitamin C supplement (VC) the mean is 7.98

    summary(ToothGrowth[which(ToothGrowth$dose==0.5 & ToothGrowth$supp=='VC'),]$len)

    ## Min. 1st Qu. Median Mean 3rd Qu. Max.## 4.20 5.95 7.15 7.98 10.90 11.50

    analogously (see Appendix):

    for 0.5 mg doses and Orange Juice supplement (OJ) the mean is 13.2 for 1.0 mg doses and Vitamin C supplement (VC) the mean is 16.8 for 1.0 mg doses and Orange Juice supplement (OJ) the mean is 22.7 for 2.0 mg doses and Vitamin C supplement (VC) the mean is 26.1 for 2.0 mg doses and Orange Juice supplement (OJ) the mean is 26.1

    Confidence intervals and hypothesis tests.

    Lets do an hypothesis test. First, the inicial three null hypothesis will focus on the difference betweensupplements, for the same dose:

    H0: (VC, 0.5) = (OJ, 0.5) H0: (VC, 1.0) = (OJ, 1.0) H0: (VC, 2.0) = (OJ, 2.0)

    secondly, 4 null hypothesis will be stated on the difference between doses, for the same supplement:

    H0: (VC, 0.5) = (VC, 1.0) H0: (VC, 1.0) = (VC, 2.0) H0: (OJ, 0.5) = (OJ, 1.0) H0: (OJ, 1.0) = (OJ, 2.0)

    The alternative hypothesis will be that the values are different(~=) or greater. We use the t.test functionfor performing the hypothesis test. The results are as follow.

    For the supplement analysis :

    For the 0.5 mg dose, the 95% confidence interval is greater than 0, hence we reject the null hypothesisin favour of the alternative hypothesis. This can also be noted with the p-value, which is 0.006359, lessthan 0.05.

    t.test(ToothGrowth[which(ToothGrowth$dose==0.5 & ToothGrowth$supp=='OJ'),]$len,ToothGrowth[which(ToothGrowth$dose==0.5 & ToothGrowth$supp=='VC'),]$len)

    #### Welch Two Sample t-test#### data: ToothGrowth[which(ToothGrowth$dose == 0.5 & ToothGrowth$supp == and ToothGrowth[which(ToothGrowth$dose == 0.5 & ToothGrowth$supp == "OJ"), ]$len and "VC"), ]$len## t = 3.1697, df = 14.969, p-value = 0.006359## alternative hypothesis: true difference in means is not equal to 0## 95 percent confidence interval:

    2

  • ## 1.719057 8.780943## sample estimates:## mean of x mean of y## 13.23 7.98

    analogously (see Appendix):

    For the 1.0 mg dose, the 95% confidence interval is again greater than 0, hence we reject the nullhypothesis in favour of the alternative hypothesis. This can also be noted with the p-value, which is0.001038, less than 0.05.

    For the 2.0 mg dose, the 95% confidence interval is NOT greater than 0, hence we CANNOT reject thenull hypothesis in favour of the alternative hypothesis. This can also be noted with the p-value, whichis 0.9639, GREATER than 0.05.

    For the dose analysis a table is presented, for saving space:

    Supplement Dose comparison 95% Confidence interval p-value H0

    Orange Juice 1.0 mg with 0.5 mg 5.52 - 13.42 8.785e-05 RejectedOrange Juice 2.0 mg with 1.0 mg 0.19 -6.53 0.039 RejectedVitamin C 1.0 mg with 0.5 mg 6.31 - 11.27 6.811e-07 RejectedVitamin C 2.0 mg with 1.0 mg 5.69 - 13.05 9.156e-05 Rejected

    In other words, larger doses imply larger tooth growth with at least a 95% confidence.

    Conclusions

    Only for the samples studied, the conclusions are:

    The results show that for low doses (0.5 mg, 1.0 mg) there is a clear difference in tooth growth betweensupplements, being the Vitamin C supplement better for growth than the Orange Juice supplement.On the other hand, for the 2.0 mg dose there appears not to be any improvement in growth betweenthe Vitamin C and the Orange Juice supplements.

    The results show also that a larger dose implies more growth, with all the p-values very small, exceptfor the Orange Juice from 1.0 to 2.0 mg.

    The basis assumptions in this analysis were:

    All pigs are identical, this means the only variation on length was due to the supplement type and thedosage.

    The variances are assumed to be unequal. The samples are unpaired. The study is double blind, such that no placebo effect is considered.

    3

  • Appendix

    Actual computation of means and summaries

    for 0.5 mg doses and Vitamin C supplement (VC) the mean is 7.98

    summary(ToothGrowth[which(ToothGrowth$dose==0.5 & ToothGrowth$supp=='VC'),]$len)

    ## Min. 1st Qu. Median Mean 3rd Qu. Max.## 4.20 5.95 7.15 7.98 10.90 11.50

    for 0.5 mg doses and Orange Juice supplement (OJ) the mean is 13.2

    ## Min. 1st Qu. Median Mean 3rd Qu. Max.## 8.20 9.70 12.25 13.23 16.18 21.50

    for 1.0 mg doses and Vitamin C supplement (VC) the mean is 16.8

    ## Min. 1st Qu. Median Mean 3rd Qu. Max.## 13.60 15.27 16.50 16.77 17.30 22.50

    for 1.0 mg doses and Orange Juice supplement (OJ) the mean is 22.7

    ## Min. 1st Qu. Median Mean 3rd Qu. Max.## 14.50 20.30 23.45 22.70 25.65 27.30

    for 2.0 mg doses and Vitamin C supplement (VC) the mean is 26.1

    ## Min. 1st Qu. Median Mean 3rd Qu. Max.## 18.50 23.38 25.95 26.14 28.80 33.90

    for 2.0 mg doses and Orange Juice supplement (OJ) the mean is 26.1

    ## Min. 1st Qu. Median Mean 3rd Qu. Max.## 22.40 24.58 25.95 26.06 27.08 30.90

    Actual computation of confidence intervals and t.tests

    For the 0.5 mg dose, the 95% confidence interval is greater than 0, hence we reject the null hypothesisin favour of the alternative hypothesis. This can also be noted with the p-value, which is 0.006359, lessthan 0.05.

    t.test(ToothGrowth[which(ToothGrowth$dose==0.5 & ToothGrowth$supp=='OJ'),]$len,ToothGrowth[which(ToothGrowth$dose==0.5 & ToothGrowth$supp=='VC'),]$len)

    4

  • #### Welch Two Sample t-test#### data: ToothGrowth[which(ToothGrowth$dose == 0.5 & ToothGrowth$supp == and ToothGrowth[which(ToothGrowth$dose == 0.5 & ToothGrowth$supp == "OJ"), ]$len and "VC"), ]$len## t = 3.1697, df = 14.969, p-value = 0.006359## alternative hypothesis: true difference in means is not equal to 0## 95 percent confidence interval:## 1.719057 8.780943## sample estimates:## mean of x mean of y## 13.23 7.98

    For the 1.0 mg dose, the 95% confidence interval is again greater than 0, hence we reject the nullhypothesis in favour of the alternative hypothesis. This can also be noted with the p-value, which is0.001038, less than 0.05.

    #### Welch Two Sample t-test#### data: ToothGrowth[which(ToothGrowth$dose == 1 & ToothGrowth$supp == and ToothGrowth[which(ToothGrowth$dose == 1 & ToothGrowth$supp == "OJ"), ]$len and "VC"), ]$len## t = 4.0328, df = 15.358, p-value = 0.001038## alternative hypothesis: true difference in means is not equal to 0## 95 percent confidence interval:## 2.802148 9.057852## sample estimates:## mean of x mean of y## 22.70 16.77

    For the 2.0 mg dose, the 95% confidence interval is NOT greater than 0, hence we CANNOT reject thenull hypothesis in favour of the alternative hypothesis. This can also be noted with the p-value, whichis 0.9639, GREATER than 0.05.

    #### Welch Two Sample t-test#### data: ToothGrowth[which(ToothGrowth$dose == 2 & ToothGrowth$supp == and ToothGrowth[which(ToothGrowth$dose == 2 & ToothGrowth$supp == "OJ"), ]$len and "VC"), ]$len## t = -0.0461, df = 14.04, p-value = 0.9639## alternative hypothesis: true difference in means is not equal to 0## 95 percent confidence interval:## -3.79807 3.63807## sample estimates:## mean of x mean of y## 26.06 26.14

    For the orange juice, comparing the 0.5 mg with the 1.0 mg doses results in a rejection of the nullhypothesis:

    t.test(ToothGrowth[which(ToothGrowth$dose==1.0 & ToothGrowth$supp=='OJ'),]$len,ToothGrowth[which(ToothGrowth$dose==0.5 & ToothGrowth$supp=='OJ'),]$len)

    #### Welch Two Sample t-test

    5

  • #### data: ToothGrowth[which(ToothGrowth$dose == 1 & ToothGrowth$supp == and ToothGrowth[which(ToothGrowth$dose == 0.5 & ToothGrowth$supp == "OJ"), ]$len and "OJ"), ]$len## t = 5.0486, df = 17.698, p-value = 8.785e-05## alternative hypothesis: true difference in means is not equal to 0## 95 percent confidence interval:## 5.524366 13.415634## sample estimates:## mean of x mean of y## 22.70 13.23

    For the orange juice, comparing the 1.0 mg with the 2.0 mg doses results in a rejection of the nullhypothesis:

    #### Welch Two Sample t-test#### data: ToothGrowth[which(ToothGrowth$dose == 2 & ToothGrowth$supp == and ToothGrowth[which(ToothGrowth$dose == 1 & ToothGrowth$supp == "OJ"), ]$len and "OJ"), ]$len## t = 2.2478, df = 15.842, p-value = 0.0392## alternative hypothesis: true difference in means is not equal to 0## 95 percent confidence interval:## 0.1885575 6.5314425## sample estimates:## mean of x mean of y## 26.06 22.70

    For the Vitamin C, comparing the 0.5 mg with the 1.0 mg doses results in a rejection of the nullhypothesis:

    #### Welch Two Sample t-test#### data: ToothGrowth[which(ToothGrowth$dose == 1 & ToothGrowth$supp == and ToothGrowth[which(ToothGrowth$dose == 0.5 & ToothGrowth$supp == "VC"), ]$len and "VC"), ]$len## t = 7.4634, df = 17.862, p-value = 6.811e-07## alternative hypothesis: true difference in means is not equal to 0## 95 percent confidence interval:## 6.314288 11.265712## sample estimates:## mean of x mean of y## 16.77 7.98

    For the Vitamin C, comparing the 1.0 mg with the 2.0 mg doses results in a rejection of the nullhypothesis:

    #### Welch Two Sample t-test#### data: ToothGrowth[which(ToothGrowth$dose == 2 & ToothGrowth$supp == and ToothGrowth[which(ToothGrowth$dose == 1 & ToothGrowth$supp == "VC"), ]$len and "VC"), ]$len## t = 5.4698, df = 13.6, p-value = 9.156e-05## alternative hypothesis: true difference in means is not equal to 0## 95 percent confidence interval:## 5.685733 13.054267## sample estimates:## mean of x mean of y## 26.14 16.77

    6

    Load and explore the ToothGrowth dataBasic data analysis.Confidence intervals and hypothesis tests.ConclusionsAppendixActual computation of means and summariesActual computation of confidence intervals and t.tests