Stats Homework 5

Embed Size (px)

Citation preview

  • 7/23/2019 Stats Homework 5

    1/7

    The PH scale is an indicator of soil acidity. Social acidity is essential tomaintain healthy growing environment for dierent types of plants. As such,it may be useful to determine the dierence in acidity between variousgrowing plots so that on may plant and harvest the largest yield possible.Such an exercise would certainly be a notch on the belt of any budding

    agronomist. o pun intended. At all.

    !n order to determine the basic frames and vectors of the data, anexploratory analysis was performed with use of Studio. The data was foundto have the following characteristics.

    The data for Soil PH based on "ocation can be described as#"ocation $in. %st&u $edian $ean 'rd&u. $axA (.) (.)* (.+' ).') ).% ).*'- (.' (.( (.'* (.// (.*) (.)*

    And to get a basic idea of how the data is distributed, a boxplot comparingthe locales was prepared in 0 studio.

    !n order to understand what this data is telling us, it would probably be agood idea to calculate con1dence intervals. -ut should those be calculatedwith T or 2 distribution3 Since our sample si4e of n5+ is extremely small, use

  • 7/23/2019 Stats Homework 5

    2/7

    of distribution will be dependent on how normally the data is distributed. Adensity plot of the data was utili4ed to get a basic idea of our distribution.

    The data appears to be pretty far from normally distributed. Thus it wouldn6t

    really ma7e sense to compute a con1dence interval using the T distribution.!n order to calculate the t score, use the sample statistic 8dierence inmeans9 plus:minus t; multiplied by the standard error 8Stili4ingthis process with a t; of %.)? 8gleaned from a +*@ con1dence interval anddf5)9. This gives con1dence intervals of 8.+)',.??'/(9. "oo7ing atthese con1dence intervals we notice that they do not in fact cross so thereis little reason to believe that a hypothesis test is necessary given our +*@con1dence intervals show that the data do not intersect based on adierence in means. The standard deviation used in the calculation ofcon1dence intervals was gleaned from a bootstrap distribution of dierencesin means. That calculation was made using the call sd88replicate8%,

    8mean8sample8locationA,si4e5+,replace5T0>99Bmean8sample8location-,si4e5+,replace5T0>999. This put out a value of.%*+/'. This was then ta7en over the s=uare root of the sample si4e. The tdistribution used in calculating the con1dence interval had

  • 7/23/2019 Stats Homework 5

    3/7

    less than '. Some data simulation may be useful to understand what ourdata is telling us.

    Problem # !ce Dream

    An exploratory analysis of the data was already performed in previousassignments. The exploratory analysis is pasted here for convenience#

    The feud between vanilla and chocolate has created enemies in ice and rainali7e. Eust how to ma7e the decision between two scrumptious delights isvery perplexing to the uneducated consumer. !n considering potential factorsresulting in consumer preference, caloric inta7e may contribute to thepreference curve of a chocolate or vanilla consumer. !t was hypothesi4ed that

    caloric content of chocolate vs. Fanilla iceBcream would have a null eect onconsumer utility derived from said product. To test that hypothesis, apopulation of (+ iceBcream brands both chocolate and vanilla was assembledand calorie content was noted.

    !nitially loo7ing at the data, the variable calories was separated by Gavor tocompare chocolate vs. vanilla. The subset of chocolate iceBcream was foundto have the following characteristics#

    $in. %st&u $edian $ean 'rd&u $ax% %/ %( %+).( ? '

    !n addition, the standard deviation of the chocolate subset was ?'.?. Thisvalue represents a maximum standard deviation e=uivalent to '%.('@ of themean. Therefore, the data is varied around the mean.

    !n the case of the Fanilla subset, the same approach was repeated saveisolating vanilla as the Gavor of interest. The subset of vanilla was found tohave the following characteristics.

    $in. %st&u $edian $ean 'rd&u $ax% %/ %? %+%./ / '(

    !n addition, the standard deviation of the chocolate subset was *).?// Thisvalue represents a maximum standard deviation e=uivalent to '.?'@ of themean. Therefore, the data is varied around the mean.

  • 7/23/2019 Stats Homework 5

    4/7

    The boxplot of both subsets is as follows#

    !n this latest installment of the iceBcream conundrum, we examine the datain terms of it6s paired brand. !n order to do that we have to rearrange thedata by brand. The dataset was imported into studio, melted and recast withbrand as the identifying variable and chocolate and vanilla as dependents.The dierence between chocolate and vanilla calorie content by brand wasthen computed by subtracting brandchocolateBbrandvanilla. These values wereaggregated into the dierence set di.I The data regarding the dierencesin iceBcream calorie content by brand yielded the following characteristics#

    $in. %st& $edian $ean 'rd&u. $axB%( + (.'' % *

  • 7/23/2019 Stats Homework 5

    5/7

    The resulting data points were bootstrapped to gain a standard deviationestimate of the total population#

  • 7/23/2019 Stats Homework 5

    6/7

    with this data the t test was conducted using the populationbootstrap mean subtracted from the initial point mean and the standard

    deviation of the bootstrap mean over the s=rt of the sample si4e n. This gavea tscore value of '.(++++). The p value associated with that level of T on a tdistribution with

  • 7/23/2019 Stats Homework 5

    7/7

    -ased on this two tail distribution, we have suJcient evidence to reKect

    the null hypothesis that the mean of dierences between calories by iceBcream Gavor by brands is e=uivalent to . Therefore, there is a dierence.