View
217
Download
2
Category
Preview:
Citation preview
Laura Boren
MATH 1040 Skittles Data Project
For our project in MATH 1040 everyone in the class was asked to buy a 2.17 individual
sized bag of skittles and count the number of each color of candy in the bag. The class data was
compiled and we used it for a number of different exercises involving a different aspect of
statistics.
For the first part of the project, we determined the proportion of each color of candy and
created a Pareto chart and a pie chart for the total number of each color of candies in the entire
class. We compared the class data to our own personal data and noted any similarities or
differences.
For part 2 of the project we used the skittles data to create statistics summaries of the
mean, standard deviation and 5-number summary. We made a frequency histogram of the total
number of candies as well as a box plot. Individually, I also wrote a paragraph about the
significance of different qualitative and quantitative methods of analysis.
The last part of the project involved confidence intervals. We found 3 different
confidence intervals for the population proportion, mean, and standard deviation and wrote an
analysis about what each confidence interval meant.
Laura Boren, Melissa Oneal, Justin Peck, Nathan Schafer
Math 1040 Class Skittles Proportions Color Count Proportion of Total
Red Skittles
564 0.199
Orange Skittles
564 0.199
Green Skittles
566 0.199
Purple Skittles
559 0.197
Yellow Skittles
586 0.206
Total Number of Skittles in the class
2839
1.000
MATH 1040 Skittles Data
Laura Boren, Melissa Oneal, Justin Peck, Nathan Schafer
Does the Class data represent a random sample?
Yes, the class data does represent a random sample. Although each student was asked to buy their own
bag of skittles and not every bag of skittles in the region had an equal chance of being selected, the
distribution of skittles from the central plant/warehouse was most likely random. The skittles company
most likely does not count colors as they load the bags and simply loads by weight, and assuming
students did not make any biased decisions about which bag to grab off the shelf every bag produced had
an equal chance of being shipped to any location in the country and being selected at random by a student
in the class.
What would the population be?
In this study, the sample is the class data. Since not everyone in the class is currently living in the same
state, the population would be all 2.17 ounce skittles bags in the United States. There are currently
different manufacturing plants operating overseas, therefore the population can only reasonably be
expanded to include the United States distribution circuit.
Laura Boren
Math 1040 Skittles Data Skittles Color Class Total Proportion My Total Proportion
Red Skittles
564 0.199 16 0.258
Yellow Skittles
586 0.206 11 0.177
Orange Skittles
564 0.199 10 0.161
Green Skittles
566 0.199 15 0.242
Purple Skittles
559 0.197 10 0.161
Total Skittles
2839 62
My skittles bag differed quite a bit from the class data. My bag had significantly more red and green
skittles than the class total, but like the class data had the fewest purple skittles. I had always assumed
that red was the most common skittles color, but that may just be due to the vibrancy of the color red
and it being noticed more. In my skittles bag it was the most common, but that was not supported by
the class data. I was surprised to see yellow skittles being the most common in the class.
1. Using the total number of candies in each bag in our class sample, compute the
following measures for the variable “Total candies in each bag”:
(a) mean number of candies per bag
The mean number of candies per bag is 59.1 candies.
(b) standard deviation of the number of candies per bag
The standard deviation per bag is 6.4 candies.
(c) 5-number summary for the number of candies per bag
The 5-number summary is 34-58-60-62-71.
Report these summary statistics rounded to one decimal place, if needed.
Math 1040 Skittle Data 2015
Laura Boren
Skittles Data Part 3
1. From these graphs we can conclude that the Frequency Histogram is skewed to the left,
although our boxplot appeared rather symmetrical, likely due to not having smaller value
increments on the number line. This distribution and skew is expected because the median
number of candies per bag is 60 but the mean is only 59.1. One of the main causes of the
negative skew is that several of the skittles bags only had 30-40 candies in them, which is almost
half as much as the median number of skittles per bag. Those bags represent outliers, and pull the
data towards the left. My data agrees with the data collected by the whole class because the
highest frequency of candies per bag was between 60-65 candies per bag. My bag had 62
candies, which falls right in that class.
2. Categorical variables are also known as qualitative variables. These variables can be put
into different categories, such as a model of car, color, gender, etc. Quantitative data is data that
can be ordered and measured. The number of candies in a bag of skittles is quantitative, whereas
the color of the candy is categorical.
Graphing quantitative data is best done with histograms, stem leaf plots, dot plots, bar
graphs, and box plots. All of these types of graphs can be used to measure the quantity of a
certain variable. Categorical data is best graphed using a method that lets you compare the
groups to one another. A bar graph can work for both quantitative and categorical data, but a pie
chart doesn’t make sense for quantitative data because it is comparing categories to the whole. A
pie chart would effectively show the percentage of each color of skittles in a bag (categorical
data), but cannot effectively be used to show the number of skittles in a bag (quantitative data).
When it comes to calculations, mean and median only make sense for quantitative data.
The mean is the average quantity of something in an entire sample, therefore it is a more
meaningful calculation when applied to quantitative data. The median represents the middle
value of the data and once again makes the most sense only when applied to quantitative data.
The best central tendency to apply to categorical data is the mode. When looking at the colors of
candy in a skittles bag, you may not able to find the average color or the median color, but you
can establish which color occurs the most often. Likewise, when looking at the number of
candies in a skittles bag, the best values for probability distributions are going to be the average
and median number of skittles.
Laura Boren, Nathan Schafer, Justin Peck, Melissa Oneal
99% Confidence Interval estimate for the population proportion of yellow candies
X= 586
n= 2839
Z-value for 95% CI = 2.576
p= 586/2839 = 0.206
0.206 +/- 2.576 * (0.007596)
0.206 +/- 0.01957
99% Confidence Interval Estimate: (0.186, 0.226)
Confidence Intervals estimated from a population proportion are used to determine, with the
specified degree of confidence, the proportion of a characteristic found within a population. In
relation to the skittles, we are 99% confident that the proportion of yellow skittles in any bag of
skittles falls between 0.186 and 0.226.
95% Confidence Interval estimate for the population mean number of skittles per bag
n= 49
Sx = 6.38
Sample mean= 59.15
Standard error of the mean = 0.9114
To find the t-value, a t-table was consulted using a degree of freedom of 50. The t-value is 2.009.
59.15 +/– t*(0.9114)
59.15 + 1.83 = 60.98
59.15- 1.83 = 57.32
95% Confidence Interval Estimate: (57.32, 60.98)
Confidence Interval estimates of the population mean use sample date to extrapolate an interval with
the specified degree of confidence that the mean characteristic of a population should fall within. In
this case, we are 95% confident that the mean number of skittles in any bag is between 57.32 and
60.98.
Laura Boren, Nathan Schafer, Justin Peck, Melissa Oneal
98% confidence interval estimate for the population standard deviation of the number of
candies per bag
n=49
s=6.378
S2=40.679
χ2 1-a/2 = 0.99
χ2 a/2 = 0.01
On the Chi square distribution chart, 50 degrees of freedom was used. The value for χ2 1-a/2 was
29.707. For χ2 a/2 it was 76.154.
√[ s2(df)/Chi value]
Lower bound: 5.06
Upper bound: 8.11
Confidence Interval estimates from the population standard deviation use the sample standard
deviation in order to generate an interval that the population standard deviation of the number of
candies should fall within, with the specified level of confidence. In this case, we are 98%
confident that the population standard deviation is within 5.06 and 8.11 candies. The problem
with confidence interval estimates taken from the sample standard deviation is that the sample
standard deviation may be quite different from the actual population standard deviation.
Laura Boren
The purpose of taking sample data and calculating statistics from them is to apply those
statistics to a larger population. Since a population is larger than a sample, how well a sample
statistic can be used to estimate a population parameter is an issue. A confidence interval helps to
solve that issue by allowing us to provide a range of values that the population parameter is
likely to fall within. The intervals are constructed with a certain level of confidence, reflected as
a percentage such as 95%, 98% or 99%. This means that if the same population were to be
examined on multiple occasions and a parameter interval calculated each time, the intervals
would contain the true parameter in X% of cases.
Laura Boren
Skittle Project Reflection
When I first started the Skittles project, I was intimidated by the process of using
statistical concepts to interpret real-life data. As the project went on I became much more
comfortable with concepts such as confidence intervals and creating Pareto charts and frequency
histograms. In my volunteer work as a lactation educator and also as a nursing student I
sometimes find myself reading and interpreting peer-reviewed clinical research. Understanding
what things like confidence intervals are and what makes data significant or unusual is very
helpful in interpreting such studies and thinking critically about what the data actually means.
There are even some aspects of statistics that I used before taking this class. In Human
Physiology we were required to calculate the mean, median, and standard deviation of lung
inspiratory volume as part of our laboratory unit on the respiratory system.
Taking calculus really helped me to understand real-world math applications and
statistics only supported what I already knew about the practicality of math. Statistics is a very
fundamental part of scientific literacy and has numerous applications in the world of business
and economics. By completing the skittles project it helped me to understand how businesses and
corporations might need to use statistics, particularly standard deviations, in order to produce
accurate and consistent products. Statistics can also be used to calculate demand and determine
shipping and distribution needs, and evaluate product quality and customer satisfaction. In our
skittles project we determined the average proportion of each color of skittles candy that came in
a bag as well as a confidence interval of that population proportion. This could be helpful in
evaluating customer candy preferences and overall satisfaction based on flavor preference. A
company might use similar statistics in real life to ensure product standardization.
Recommended