Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Math 1040 Skittles Term Project
Changhun Kim
Math 1040
12/10/17
Report Introduction
The goal of this project is to compare date collected by the class for 2.17oz bags of Skittles. I was asked
to count each color and report the totals to my teacher who then compiled the data. Once the data was all
collected I applied concepts learned in class to provide the charts and figures below.
Data Collection
Each student in the class will purchase one 2.17-ounce bag of Original Skittles and record the following
data:
Number of
red candies
Number of
orange
candies
Number of
yellow
candies
Number of
green candies
Number of
purple
candies
total
11 10 13 12 11 57
0.193 0.175 .228 .211 .193 1
Entire class results:
Red Orange Yellow Green Purple Total
332 286 370 270 312 1570
0.211 0.182 0.236 0.172 0.199 1
First, each member of the class purchased a 2.17-ounce bag of Original Skittles. I searched far and
wide, and after going to several grocery stores. I counted the number of red, orange, yellow, green
and purple candies form the bag. My bag of skittles had 11 red, 10 orange, 13 yellow, 12 green, 11
purple skittles.
Using the data compiled from the entire class, record the following information:
The total number of candies in the sample = 57
Organizing and Displaying Categorical Data: Colors
Class Data for Number of Skittles by color Pie Chart
Red, 332, 0.211465
Orage, 286, 0.182166
Yellow, 370, 0.235669
Green, 270, 0.171975
Purple, 312, 198726
Yellow Red Purple Orange Green0
50
100
150
200
250
300
350
400
The Number of Canides of Each Color in the Samplecolor
num
ber o
f Ski
ttles
Can
ides
I recorded the proportion of each color from the sample data gathered from the class. I created a
Pie Chart and a Pareto Chart for the numbers of candies of each color. To create the pie chart for
this data, I listed the color categories in the first five cells of column
Organizing and Displaying Quantitative Data: the Number of Candies per Bag
Column n Mean Std.dev Min Q1 Median Q3 Max
Total 26 60.385 3.817 47 59 61 63 65
45-49 50-54 55-59 60-64 65-690
2
4
6
8
10
12
14
16
18
1 1
6
17
1
Skittles Project Histogram
The distribution of the histogram is skewed to the right and doesn’t appear to have a symmetrical
bell shape to it due to the skewed look. The graphs that are shown above were what I expected to
see. With the histogram, it shows how many times numbers over 60 appear in the graph. The
boxplot shows the 5 number summary of the skittles collected. Comparing my skittles collection to
the class’s collection, the class collection has a right normal distribution making it a bit different
from my skittles.
Reflection
Categorical data is data that is collected with numbers that don’t have a special meaning to them.
Some examples would be social security numbers and colors. Categorical data deals more with
names than numbers. Quantitative data is data that is collected with numbers that have meaning to
them. An example would include time. Quantitative data deals with a finite and infinite numbers.
For the calculations, the pie chart and pareto chart both make sense for the categorical data
because they deal with putting colors together. Having a histogram for categorical data wouldn’t
make sense because a histogram deals with numbers and the amount of times a number appears in
a data set. For quantitative data, the histogram and boxplot make sense because they show the
number values. A pie chart wouldn’t work because showing the numbers in a pie chart doesn’t look
right.
Confidence Interval Estimates
A confidence interval is a range (or an interval) of values used to estimate the true value of a
population parameter. The purpose is to find out a true proportion of a sample
Construct a 99% confidence interval estimate for the true proportion of yellow candies.
Construct a 95% confidence interval estimate for the true mean number of candies per bag.
Construct a 98% confidence interval estimate for the standard deviation of the number of candies per bag.
With a 98% confidence interval level, the repeated experiment for the proportion of candies per
bag, skittles would have a confidence interval estimate for the standard deviation such as intervals
contain the estimated population mean would be 98%.
Hypothesis Tests
Hypothesis testing refers to the formal procedures used in statistical analysis to accept or reject
statistical hypotheses. A statistical hypothesis is an assumption about a population parameter. This
assumption may or may not be true. The usual process of hypothesis testing consists of several
steps. A basic outline is as follows:
• Formulate the null hypothesis (HO) and the alternate hypothesis (H1).
• Identify a test statistic that can be used to assess the truth of the null hypothesis.
• Draw a graph to include the test statistic, critical values, and critical region (if using
the critical value method).
• Reject the null hypothesis (HO) if the test statistic is in the critical region. Fail to reject the null
hypothesis if the test statistic is not in the critical region.
• Restate this previous decision in simple, non-technical terms, and address the original claim.
Use a 0.05 significance level to test the claim that 20% of all Skittles candies are red.
Use a 0.01 significance level to test the claim that the mean number of candies in a bag of Skittles is 55.
The first hypothesis test we had to do was to test the claim that 20% of all skittle candies are red.
With the data collected from our sample, our proportion of red skittles came to 21.1%. After
calculating the critical values with a 0.05 significance level and the z-score , we found that a 20%
proportion of red skittles is very plausible and therefore we failed to reject the claim. Below are the
calculations for the hypothesis test.
The second hypothesis test we were to complete was to test the claim that the mean number
of candies in a bag of skittles is 55. As discussed in the second confidence interval estimate we did,
our sample mean was 60.385 and with a 95% confidence interval we determined that the true mean
was between 58.843 and 61.927. With that information we had a pretty good idea that we would be
rejecting the claim. By calculating the critical test statistic with a 0.01 significance level we found
that in order for that claim to be true, our t value must fall between -2.787 and 2.787. In reality, the
t value came out to be 7.194, which is way outside the range of acceptable numbers, therefore, we
reject the claim that the mean number of candies in a bag of skittles is 55.
Reflection
To get accurate statistics while determining an interval estimate and preforming a
hypothesis test, there are certain requirements that must be met. For constructing a confidence
interval estimate for a population proportion the requirements are as follows; he sample is a simple
random sample, the conditions for the binomial distribution are satisfied, there are at least five
successes and at least 5 failures. For constructing a confidence interval estimate for a population
mean the requirements include; the sample is a simple random sample, and the population is
normally distributed or n>30. For constructing a confidence interval estimate for a population
standard deviation the requirements are; the sample is a simple random sample, and the population
must have normally distributed values even if the sample is large. For testing a claim about a
population proportion the requirements are; the sample observations are a simple random sample,
the conditions for a binomial distribution are satisfied, and the conditions np>/=5 and nq>/=5 are
both satisfied. For testing a claim abut a population mean the requirements are; the sample is a
simple random sample, and the population is normally distributed or n>30. For those requiring
conditions for a binomial distribution to be satisfied, that means that there is a fixed number of
independent trials having constant probabilities and each trial has to outcome categories of success
or failure. One possible error that occurs from using this data has to do with our sample size. We
only used 25 bags of skittles and one of the requirements, because our sample is not normally
distributed, is that our sample size needs to be greater than 30. Another possible error could occur
from inaccurate information, whether the data was recorded wrong, or some students got the
wrong size bag of skittles. This sample could be improved by using a larger sample size, and also by
verifying the data submitted. Students could count the skittles in class and have another classmate
double check their work to be sure the data was submitted correctly. It is very interesting and
helpful to see all how these math problems are applicable to real life. This seems like a very
effective way to determine the quality control of a product a manufacture is supplying. It is also
helpful to see how a consumer can verify that they are getting the right amount of product they are
paying for.