MATH 1040 Final Project Skittles Study - WordPress.com · 2016-04-26 · find real-world applications for math (beyond the basic arithmetic, of course, that I used for my taxes last

MATH 1040 Final Project

Skittles Study Hillary Bowler

Contains:

Data Collection

Organization of Data

Confidence Interval Estimates and Hypothesis Tests

Reflection

Project Introduction:

The first step in any statistics study is the data collection. For this project, each student in our class

purchased one small packet of skittles at the grocery store—generating the best random sample possible.

We each counted and recorded the frequency of each of the five colors within our bags and submitted that

information to the professor. Below you will find the table I personally submitted, followed by the full

data set for the class. From there, my group was able to organize the data (in Part II), calculate estimates

and run tests (in Part III). My personal reflection on the assignment can be found in Part IV.

PART I: DATA COLLECTION Colors in My Skittles Bag

Color Frequency

Red 12

Purple 9

Green 17

Yellow 12

Orange 10

Candies Database for the Final Project MATH 1040 Spring 2016

Name Red Orange Yellow Green Purple

Total

Count

1 Anderson, Jennifer M. 11 10 14 13 11 59

2 Bowen, Sydney 10 14 9 13 14 60

3 Bowler, Hillary A. 12 10 12 17 9 60

4 Chaires, Carmen 10 11 13 9 18 61

5 Clark, Ashley 15 14 11 11 8 59

6 Cordova, Amanda E. 9 10 16 15 5 55

7 Darger, Virginia N. 10 13 13 7 19 62

8 Dumas, Cynthia V. 12 10 11 18 8 59

9 Ferre, Bethany 13 15 12 13 8 61

10 Hanes, Joel H. 12 14 13 12 10 61

11 Hoth, Blake 10 11 16 13 9 59

12 King, Jasmine A. 13 14 4 13 14 58

13 Macey, George H. 12 8 12 12 16 60

14 Ouro-Gneni, Alamissi 11 9 14 14 13 61

15 Page, Cynthia 11 14 7 16 11 59

16 Piela, Alicia M. 12 17 10 11 8 58

17 Pratt, Eric D. 8 14 17 11 11 61

18 Ratliff, Lily M. 12 8 6 14 18 58

19 Schofield, Victoria A. 11 12 15 7 15 60

20 Silva, Joshua D. 13 18 5 9 15 60

21 Smithwick, Jillian C. 6 15 14 8 11 54

22 Sorensen, Jared L. 11 11 11 9 18 60

23 Udy, Dustin J. 16 15 11 9 7 58

24 Veroni Fioramonti, Elias F. 17 7 7 15 12 58

25 Walsh, Sarena K. 13 11 13 10 12 59

26 Wilstead, Tanea C. 14 11 9 12 10 56

Total: 304 316 295 311 310 1536

https://lbforms.slcc.edu/pls/slcc/bwlkosad.P_FacSelectAtypView?xyz=MjM5NDQ0

https://lbforms.slcc.edu/pls/slcc/bwlkosad.P_FacSelectAtypView?xyz=NzE5Mzkz

https://lbforms.slcc.edu/pls/slcc/bwlkosad.P_FacSelectAtypView?xyz=Mzk2NDUx

https://lbforms.slcc.edu/pls/slcc/bwlkosad.P_FacSelectAtypView?xyz=NTAyOTQ1

https://lbforms.slcc.edu/pls/slcc/bwlkosad.P_FacSelectAtypView?xyz=NTU2MTE1

https://lbforms.slcc.edu/pls/slcc/bwlkosad.P_FacSelectAtypView?xyz=NDg1NTYx

https://lbforms.slcc.edu/pls/slcc/bwlkosad.P_FacSelectAtypView?xyz=ODExNDYx

https://lbforms.slcc.edu/pls/slcc/bwlkosad.P_FacSelectAtypView?xyz=NTg4MDQ0

https://lbforms.slcc.edu/pls/slcc/bwlkosad.P_FacSelectAtypView?xyz=MzM5NzUy

https://lbforms.slcc.edu/pls/slcc/bwlkosad.P_FacSelectAtypView?xyz=NTc5NTcw

https://lbforms.slcc.edu/pls/slcc/bwlkosad.P_FacSelectAtypView?xyz=NzY4ODE1

https://lbforms.slcc.edu/pls/slcc/bwlkosad.P_FacSelectAtypView?xyz=NjQ0NjIy

https://lbforms.slcc.edu/pls/slcc/bwlkosad.P_FacSelectAtypView?xyz=MzM2NzI5

https://lbforms.slcc.edu/pls/slcc/bwlkosad.P_FacSelectAtypView?xyz=NzE3Njg1

https://lbforms.slcc.edu/pls/slcc/bwlkosad.P_FacSelectAtypView?xyz=MjI0ODE3

https://lbforms.slcc.edu/pls/slcc/bwlkosad.P_FacSelectAtypView?xyz=NjU0NTg3

https://lbforms.slcc.edu/pls/slcc/bwlkosad.P_FacSelectAtypView?xyz=NTY0NzY5

https://lbforms.slcc.edu/pls/slcc/bwlkosad.P_FacSelectAtypView?xyz=NTg1NzI0

https://lbforms.slcc.edu/pls/slcc/bwlkosad.P_FacSelectAtypView?xyz=NjY0NjA0

https://lbforms.slcc.edu/pls/slcc/bwlkosad.P_FacSelectAtypView?xyz=NjYwOTgz

https://lbforms.slcc.edu/pls/slcc/bwlkosad.P_FacSelectAtypView?xyz=NDAwOTgz

https://lbforms.slcc.edu/pls/slcc/bwlkosad.P_FacSelectAtypView?xyz=Njg1NzY4

https://lbforms.slcc.edu/pls/slcc/bwlkosad.P_FacSelectAtypView?xyz=MjQ5OTU0

https://lbforms.slcc.edu/pls/slcc/bwlkosad.P_FacSelectAtypView?xyz=NzQ2MjM5

https://lbforms.slcc.edu/pls/slcc/bwlkosad.P_FacSelectAtypView?xyz=MjQ5Nzgw

https://lbforms.slcc.edu/pls/slcc/bwlkosad.P_FacSelectAtypView?xyz=NjA1NTc1

PART II: DATA ORGANIZATION

Introduction

The first step in our process was to collect the data—this came in the form of a report from each student

on their respective bag of skittles. We reported the number of each color we found in the bags we

purchased. Our objective is to study the data to understand the frequency and distribution of the

occurrence of the number of candies and number of each color of candy. To better understand this, we

compared class totals and our own personal totals. We detail that comparison below.

-Categorical Data

-Discussion: Both graphs for the full class data express what we expected to see. The totals for each color

were similar—the range was only 21 for data in that’s in the 100s. The pie chart and Pareto show a very

small difference between the frequencies of each color. The story is a little different for our individual

data. Hillary clearly had more greens than anything else, Ashley had more reds, Amanda had the most

yellows and Victoria had more purples than any other color. Each of our Pareto charts especially show a

very clear descending pattern and clear difference between the frequencies of colors. This is probably

because the total number of skittles in each bag is small compared to class totals, so even the smallest

difference of frequency is significant.

-Mean/Standard deviation/5-Number Summary

Mean: 59.08

Standard Deviation: 1.9

5-Number Summary

Minimum: 54

Maximum: 62

Q1: 58

Q2 (Median): 59

Q3: 60

- Histogram and Box plot

-Discussion: The graphs show a fairly normal distribution of data, minus one outlier and one gap. The

histogram has no data for 57 skittles, because no one out of the 26 students had a bag with that exact

count. The box plot shows one outlier at 54—this student had the lowest skittle count by far. Again, we

expected a pretty normal distribution due to a lack of much variance between each student’s data, the

standard. All of our individual data (Hillary 61, Ashley 59, Amanda 55, Victoria 60—see a box plot of

our data below) fall within range, though Amanda’s is on the low end and more than 4 skittles away from

the mean.

Reflection: Put simply, quantitative data requires numbers and categorical data divides information by

category. The first set of graphs, pie charts and Pareto charts, are better suited to categorical data.

Categorical data is discrete and not always numerical. While we did count the quantity of each color,

color itself is not numeric. The histogram and boxplot work better for quantitative data. They were a

visual representation of the number of skittles per each bag in the class sample and their distribution. You

can’t calculate much for categorical data beyond total counts per each category. With quantitative data,

you can calculate everything from mean to standard deviation and variance. The categorical data was fun

to know, but the quantitative data of the skittles provided more calculations to help us really understand

skittle packaging patterns.

PART III: ESTIMATES AND TESTS Introduction

For the purposes of this class, we are treating the class data as a workable sample to help us make

calculations and hypotheses for the population (or all manufactured skittles). We’ve included the

StatCrunch calculations for confidence intervals and hypothesis tests for the skittles population below.

Although the class data only includes information for 26 bags of skittles (making n < 30), we will treat

the data as if our sample is greater than 30 and assume that the data is normally distributed.

-Confidence Interval Estimates

Explain in general the purpose and meaning of a confidence interval.

The book defines a confidence interval as a range of values used to estimate the true value of a population

parameter—the parameters we’re calculating for this assignment are the true proportion, true mean and

standard deviation. In general, a confidence interval estimates the level of certainty with which we can

conclude our calculations—it’s the probably that the set interval actually contains the calculated

parameter.

Confidence Interval Estimate: True Proportion for Purple Candies

95% confidence interval results: 0.182 < p < 0.222 p : Proportion of successes

Method: Standard-Wald

Proportion Count Total Sample Prop. Std. Err. L. Limit U. Limit

p 310 1536 0.20182292 0.010240927 0.18175107 0.22189476

Confidence Interval Estimate: True Mean of Candies Per Bag

99% confidence interval results: 58.041 < μ < 60.113 μ : Mean of variable

Variable Sample Mean Std. Err. DF L. Limit U. Limit

Total Candies 59.076923 0.37178603 25 58.040593 60.113253

Confidence Interval Estimate: Standard Deviation of Candies Per Bag

98% confidence interval results: 1.424 < σ < 2.792 σ

2 : Variance of variable

σ : Standard Deviation—equal to square root of data below

Variable Sample Var. DF L. Limit U. Limit

Total Candies 3.5938462 25 2.0274843 7.7964549

-Interpretation

Each of these answers, written out, is as follows:

- We can be 95% confident that the interval between 0.182 and 0.222 actually does contain the

value of the true proportion of the population p.

- We can be 99% confident that the interval between 58.041 and 60.113 does in fact contain the

value of the true population mean μ.

- We can be 98% confident that the interval between 2.027 and 7.796 does contain the standard

deviation (variance squared-- σ2) of the population.

All of this means that if we were to select many different samples of the same size and construct them

with the same confidence intervals, the confidence levels would remain the same.

-Hypothesis Tests

Explain in general the purpose and meaning of a hypothesis tests.

A hypothesis is a claim about a property of a population—making a hypothesis test a procedure for

testing such claims. Hypothesis tests are used to determine the validity of assumptions about data. In this

assignment, for example, we test the claim that 20% of all Skittles are green and that the mean number of

candies per all skittles bags is 56. A null hypothesis is the original claim and an alternative hypothesis is a

statement differing from the null hypothesis used for comparison to complete the test.

Hypothesis test results: proportion of green candies p : Proportion of successes

H0 : p = 0.2

HA : p ≠ 0.2

Proportion Count Total Sample Prop. Std. Err. Z-Stat P-value

p 311 1536 0.20247396 0.010206207 0.24239742 0.8085

Hypothesis test results: mean number of candies per bag μ : Mean of variable

H0 : μ = 56

HA : μ ≠ 56

Variable Sample Mean Std. Err. DF T-Stat P-value

Total Candies 59.076923 0.37178603 25 8.2760589 <0.0001

-Interpretation

For the claim that 20% of all Skittles are green, we have a P-value of 0.809. Because 0.809 > 0.01 (level

of significance) we can conclude the following: We fail to reject the null hypothesis that 20% of all

Skittles are green.

For the claim that the mean number of candies per all Skittle bags is 56, our P-value is < 0.0001—

extremely small and far lower than the significance level of 0.05. Because 0.0001 < 0.05, we reject the

null hypothesis that the mean number of candies per all Skittle bags is 56 Skittles.

Small p-values indicate strong evidence against a null hypothesis and larger p-values indicate weak

evidence against them—hence the failure to reject the first hypothesis and rejection of the last.

PART IV: REFLECTIVE ESSAY

“Thank goodness for technology!” is by far the most repeated thought I’ve had during this class,

and I’m sure that rings true even for professional statisticians. I was grateful for the practical applications

in the project. They helped me grow more comfortable with using software like StatCrunch, and I was

even able to utilize my calculator (a TI-84) to compare results on some of the calculations. I found that

the use of technology not only saved time, but also helped me see the “big picture” of the data more

easily. I expect that the things I’ve learned about math technologies will help me get to conclusions faster

and go to greater depths with data in future research projects.

The parts of the project I found most helpful and applicable were the discussions and

interpretations. Since I have never planned on being any sort of mathematician, I have always struggled to

find real-world applications for math (beyond the basic arithmetic, of course, that I used for my taxes last

month). But I discovered that statistics is as much about interpretation and conclusion as it is the numbers.

For example, your hypothesis test isn’t finished until you have drawn a conclusion that you have either

failed to reject or that you reject the hypothesis.

The ability to collect and interpret data would be helpful in most any field, but I’m looking

forward to how helpful it will be to me personally. I will be pursuing studies in psychotherapy and will be

called upon to collect my own data, build my own research and draw conclusions. This project helped me

imagine doing that. It’s easy to feel like I am just plugging in numbers on a lot of the assignments—but

for this, I actually physically counted my personal sample and had a better vision of what the numbers

really meant.

On the one hand, skittles doesn’t feel very “real-world,” but I understand why we would need to

keep things fairly simple for the class. I clearly can’t say that knowing the approximate mean number of

candies per skittles bag is very applicable knowledge. But conducting my own statistics research from

beginning to end made the math real enough to me that I can now more easily envision applying these

skills to real-life studies.

Documents

MATH 1040 Final Project Skittles Study - WordPress.com · 2016-04-26 · find real-world applications for math (beyond the basic arithmetic, of course, that I used for my taxes last