Upload
dlulza
View
219
Download
0
Embed Size (px)
Citation preview
8/3/2019 Phase 2 Content
1/24
Surve Data Analsis 1
Surve Data Analsis
Daniel Loa
Taft College
8/3/2019 Phase 2 Content
2/24
Survey Data Analysis 2
Abac
Data was collected through a survey and fed into a database. From this database a
sample of 50 respondents was taken and the data analysed using techniques learned in an
introductory statistics course. This analysis was done in order to answer a series of questions
regarding the relationship of different random variables such as gender, income, political
association, handedness and opinion on several issues. Some results were inconclusive due to
the size of the sample as explained in the results section.
Inodcion
The purpose of our research here was to better understand how conduct a scientific
study and write a proper report. We also wanted to get some practice using the statistical
techniques we have learned throughout this course. We accomplished this by conducting a
study on a population constructed by collecting data from random individuals using a survey. We
took a sample from this survey and set out to answer several questions based on the data
collected.
Our first question was whether or not there is an appropriate relationship between a
persons height, weight, shoe size and ring size. It was hypothesized that such a relationship
exists since larger people will probably have all of these be greater than a smaller person. The
second question was if there was a difference in income based on gender for which it was
hypothesized that such a difference would be found.
The third part had several parts, all regarding the relationship between party association
and three other variables. The first part was to see if there was a relationship between party
association and the respondent believing Obama would be reelected. The second was if they
were in favor of the health care bill as passed and the third variable was their stance on the
death penalty. For all of these I hypothesized that a relationship existed since these issues are
8/3/2019 Phase 2 Content
3/24
Survey Data Analysis 3
often divided along party lines.
The next question also had more than one part to it. It was whether there was a
relationship between handedness and a persons stance on the death penalty, and the amount of
water they drank. For both it was hypothesized that no relationship would be found.
After this I wanted to see who was more likely to switch to the tea party, Democrats or
Republicans. It was hypothesized that Republicans are more likely to switch because of the Tea
Partys clearly conservative ideals.
The last question that will be answered from the data collected was whether students
were less likely to work more than 30 hours a week than non-students. The hypothesis for this
was that students were less likely to work more than 30 hours a week.
How the data to answer these questions, and how the data was analyzed to answer
them, is explained in the following sections.
Mehod
Paicipan
The people that took the survey were individuals randomly selected by students in
several different statistics classes taught in Bakersfield, CA and Taft, CA. It is likely a majority of
the individuals that took the survey reside this this general area.
Maeial
Survey: This was a survey that consisted of 23 simple questions the participants were
asked to answer. A copy of this survey can be found in the appendix.
Database: We used an online database to store the data we collected.
Pocede
8/3/2019 Phase 2 Content
4/24
Se Daa Aai 4
Fi ff, eee i he ca eed 10 ad ee. The cd gie he e
ae he aed. Oce each de had ceced hei 10 ae he added he a
daabae ha eea ge i haig daa f 2,628 idiida. F hi daa a ad
1-i- ae f =50 a ae. I hi cae e eded ih =52 ad g 16 a ad
iege.
We he ceeded aae hi daa i de e hhee. The aai f
hi ae daa ca be fd i he fig eci.
Rel
Each f he ei i addeed be ih he daa ed e i ad a -ae.
The hhei e ad ai ca be fd i he aedi. A -ae ee caed
agai a aha ae f 0.05.
I ediced ha hee d be a eaihi beee a e' heigh, eigh, ig ie
ad he ie. I de ge he gah I ed he a fci ad a ice f he
e.T ge he ae aiic, I eeed he daa i a TI-84 gahig caca ad ed
he 1-a Sa cad. The e ae ied be. I i h ig ha f hee
aiabe aea be diibed a hie he ae eed e hae a hae a ige
de. A f eig f a eaihi beee a he aiabe I had idea ha e e.
Uig a ceai eeed iadeae becae hee ee 4 aiabe iead f 2.
8/3/2019 Phase 2 Content
5/24
Se Daa Anali 5
Heigh Hiogam and mma aiic
Mean S Min Q1 M Q3 Ma
66.7800 4.0972 53.0000 64.0000 67.0000 70.0000 74.0000
To anale he diibion of eigh in he ample I ed he ame pocede a he
peio aiable.
8/3/2019 Phase 2 Content
6/24
Surve Data Analsis 6
Weight Histogram and summar statistics
Mean S Min Q1 M Q3 Ma
167.9400 40.9775 85 140 169 187 290
8/3/2019 Phase 2 Content
7/24
Se Daa Anali 7
Ring ie hiogam and mma aiic
Mean S Min Q1 M Q3 Ma
7.4167 1.7092 3 6 8 9 10
The hoe ie daa a alo analed in he ame manne.
8/3/2019 Phase 2 Content
8/24
Se Daa Aai 8
She ie higa ad a aiic
Mea S Mi Q1 M Q3 Ma
8.8000 2.0677 3 7.5 9 11 12.5
The ecd ei I aed ae ih hi d a, i hee a diffeece i
g ice baed gede? M hhei a e. T bee dead he daa I de a
higa ad cacaed he ae aiic f he ice f each gede. The I efed
a Wice Ra- ha eed i -ae=0.7604 hich de he hhei.
8/3/2019 Phase 2 Content
9/24
Surve Data Analsis 9
Male income histogram and summar statistics
Mean S Min Q1 M Q3 Ma
53068.1818
688832.8368
0 13000 30000 69000 300000
8/3/2019 Phase 2 Content
10/24
Survey Data Analysis 10
Female income histogram and summary statistics
Mean S Min Q1 M Q3 Max
32259.4286 29747.421
4
0 4000 30000 48500 110000
Our next question was to see if there was a relationship between political party and a
respondents opinion on a series of issues. The data is summarized in a series of tables bellow
comparing the respondents party and their response to an issue. Reporting the data in this
manner makes it easier to perform tests of independence.
The first part was to see if there was a relationship between party and whether or not the
respondent believed Obama would be reelected. The hypothesis for this was that the two
variables were dependent, however, the results were inconclusive when the Independent and
8/3/2019 Phase 2 Content
11/24
Survey Data Analysis 11
Other columns were included, even if they were combined. Further collapsing of the table was
not possible.
Obama reelection
Republican Democrat Independent Other
Yes 4 15 0 4
No 15 7 3 2
The next part was to see if there was also a relationship between party and the
respondents approval of the health care bill. The hypothesis was that a relationship existed. In
order to meet the assumptions for the test we collapsed the Independent and Other columns
together. This returned a X2 of 10.9271 and a p-value=0.0042 which supports the hypothesis.
health care Republican Democrat Other
Yes 5 13 0
No 14 9 9
The same was done about the question regarding party and stance on the death penalty.
The Independent and Other columns were combined and a test of independence was
performed. This produced a X2 of 0.1062 and a p-value=0.9483 which contradicts the
hypothesis.
death Republican Democrat Other
Yes 13 14 6
No 6 8 3
8/3/2019 Phase 2 Content
12/24
Survey Data Analysis 12
I then wanted to see if a persons handedness had an effect on their opinion about the
death penalty and the amount of water they drank. I predicted that a persons handedness should
have no effect on either of these. To check the first part of this I collected the data on a table in
order to perform a test of independence. This resulted in X2=2.6364 and p-value=0.2676 which
supports my hypothesis, however, these values had to be disregarded because the
assumptions of the test were not met. Our results are inconclusive.
Right Left Ambidextrous
Yes 18 4 1
No 25 1 1
For the next question, I drew histograms and calculated the summary statistics for the
water drank by people of each handedness. The question was whether there was a relationship
between handedness and water drank. The hypothesis was that there was no difference
between handedness. The ambidextrous people were not considered because there were only 2
in the sample, which did not represent them properly. A rank-sum test resulted in a p-value of
0.5375 which supported the hypothesis.
8/3/2019 Phase 2 Content
13/24
Surve Data Analsis 13
Water drank b left-handed people histogram and summar statistics
Mean S Min Q1
M Q3
Ma
55.4000 39.1127 8 16 60 92.5 100
8/3/2019 Phase 2 Content
14/24
Survey Data Analysis 14
Water drank by right-handed people histogram and summary statistics
Mean S Min Q1 M Q3 Max
76.9767 60.5941 1.0000 32.0000 60.0000 120.0000 256.0000
The next question was, what party were people in the Tea Party most likely to come
from. My hypothesis was that most of the members of the Tea Party would come from the
Republican party. The former party membership is show in the bar graph bellow.
This question was slightly tricky. I wanted to perform a goodness of fit test but for that I
needed to find the true proportions, which I didnt have. So I decided to use the proportion of each
partys membership as a replacement since it seems reasonable to test against these values.
The Independent and Other category were grouped together and the test was performed
resulting in a p-value=0.8453 that had no meaning due to a violation of assumptions. This test
was inconclusive.
8/3/2019 Phase 2 Content
15/24
Surve Data Analsis 15
Democrats Republicans Other
members 22 19 9
Proportion of sample 44% 38% 18%
Former part association of Tea Part member
Our last question was whether students are less likel to work 30 hours or more than
non-students. I predicted that students would me more likel to work less than 30 hours a week.
This data was then put into a table and a test of independence was performed. The X 2 was
13.1335 and the p-value was 0.0003. This supports the hpothesis.
8/3/2019 Phase 2 Content
16/24
S D A 16
L 30 30
S 18 8
N- 6 23
Discssion
O . W ,
.
T , I
. T I .
T . I -
- 0.7604,
.
T I .
T O
. T I
I . T
. T
. T
. M
.
A .
A, . F
8/3/2019 Phase 2 Content
17/24
Survey Data Analysis 17
of this, however, a rank-sum test proved my hypothesis that handedness has nothing to do with
how much water somebody drinks.
The results of the following question should be taken lightly since estimates were used in
liue of true proportions. The question was whether individuals from one party were more likely to
switch to the tea party. My hypothesis that members of one party were more likely to join the tea
party could not be proved or discredited because the assumptions for the test failed so the
p-value was practically useless.
The last question was answered using a test of independence that showed being a
student and working more that 30 hours a week were linked.
Shortly after beginning the analysis it became clear that 50 was not a big enough sample
size to reliably answer all of the questions proposed in the introduction. This sample size was
selected because it appeared to be a manageable size. However, it turned out to be too small to
the point that some statistical tests could not be performed. A better sample size would have
been 75 or 100. This would have substantially increased the effort needed to analyse the data
but at the same time produced better results.
The main purpose of this project was to be a learning experience and provide a
meaningful way to practice the techniques learned in the course.
Appendi
The raw data is available upon request. Here are the assumptions and work for each test
performed on the data. All of the following p-values were compared against alpha=0.05.
2: For the second question I performed a t-test with
Ho:M=F HA:M>F Alpha= 0.05
Then we begin to check the assumptions which are violated by normal plots. This causes us to
switch to a non-parametric test, the wilcoxon rank-sum. This results in a p-value of 0.7604.
8/3/2019 Phase 2 Content
18/24
Survey Data Analysis 18
Ho:M=F HA:M>F Alpha= 0.05
There was not enough evidence to suggest median female income was less than median male
income.
Female Income Male Income
3a: For the next section I did a test of independence for each of the tables. The first one, which
was between party and opinion about Obamas reelection.
HA: Party and belief that Obama will be reelected are independent.
HO: Party and belief that Obama will be reelected are dependent.
This first test of independence returned X2 =12.7055 and p-value=0.0053. However the
assumptions were violated because half of the expected values were bellow 5.
8.74 10.12 1.38 2.76
10.26 11.88 1.62 3.24
Even when the table was collapsed to have the Other and Independent columns together the
assumptions were still violated by the expected values since a third were bellow 5. This
8/3/2019 Phase 2 Content
19/24
Se Daa Aai 19
eeed a cci f beig eached.
8.74 10.12 4.14
10.26 11.88 4.86
3b: The e e a beee a ad aa f he heah cae bi.
HA: Pa ad aa f he heah cae bi ae ideede.
HO: Pa ad aa f he heah cae bi ae deede.
The eeced ae ae ied be ad ee ai ice a ih ae be 5.
6.84 7.92 3.24
12.16 14.08 5.76
3c: The e efed he ae e he a abe.
HA: Pa ad ace he deah ea ae ideede.
HO: Pa ad ace he deah ea ae deede.
The eeced ae ffi he ai -ae f 0.9483 i aid. Thee i egh
eidece gge a ad ace he deah ea ae deede.
12.54 14.52 5.94
6.46 7.48 3.06
4a: F he e ei, he fi a ied a e f ideedece beee hadede ad
ii he deah ea.
HA: Hadede ad ace he deah ea ae ideede.
HO: Hadede ad ace he deah ea ae deede.
8/3/2019 Phase 2 Content
20/24
Survey Data Analysis 20
The expected values clearly show that the assumptions are violated so the p-value has no
meaning. This test was inconclusive.
19.78 2.3 0.92
23.22 2.7 1.08
4b: The next part was finding a relationship between handedness and the amount of water drank.
I decided to only compare left and right handed people because there were only two
ambidextrous people in the sample. I was unable to perform a t-test because the normality
assumption was violated by normal plots.
Ho:R=L HA:R=/=L Alpha= 0.05
Left handed Right handed
This forced me to switch to a rank-sum test which required no assumptions and returned a
p-value of 0.5375. This meant the null hypothesis was not rejected. There is enough evidence to
suggest right handed people drink different amounts of water than left handed people.
Ho:R=L HA:R=/=L Alpha= 0.05
6: For the sixth and final question I performed a test of independence.
HA: Being a student and working more than 30 hours are independent.
8/3/2019 Phase 2 Content
21/24
Survey Data Analysis 21
HO: Being a student and working more than 30 hours are dependent.
The test resulted in X2 =13.1335 and the p-value was 0.0003 and the assumptions are met by
the expected values. There is not enough evidence to suggest being a student and working are
independent.
11.245 14.655
12.655 16.345
Questions contained in the survey
1. Gender: Male Female
2. Ethnicity: White Black Hispanic Other
3. Age (years):
4. Height (in inches, so 5 ft 7 inches would be 67):
5. Weight (pounds):
6. Hours worked per week:
7. Are you currently a Student? Yes No
8. Education Level (completed, not in progress):
(High School Grad = 12, Associate Degree = 14,
BA/BS = 16, MA/MS = 18, PhD = 20)
9. Annual Gross Income (numbers only, $39000 is 39000, not 39,000 or $39K):
10. Eye color: Brown Black Blue Hazel Each eye is a different color Other
11. Natural hair color. Black Blond Brown or Brunette Grey Red Silver
Other
12. Number of ounces of water you drank for the two days prior to submitting this survey.
One cup of water is 8 ounces whereas most glasses are 12 or 16 ounces.
8/3/2019 Phase 2 Content
22/24
Survey Data Analysis 22
13. Are you in favor of the death penalty? Yes No
In this context in favor means you are for the death penalty for at least one,
but not necessarily all, crimes that are currently punishable by death.7
14. What political party do you most closely associate yourself with? By associate, I mean
what party are you registered to vote with or if you are not registered, which party
would you register with if you had to chose. Democrat Republican Independent
Other
15. Are you registered to vote? Yes No
16. Do you personally know anyone that has been infected with the HIV virus?
Yes No
By personally know, I mean a personal knowledge of that person. As an example, we all
probably know Magic Johnson, but I doubt any of us have actually met him or are
friends with him. If Magic is the only person you know who has been infected with
HIV, then your answer to this question would be no.HIV, then your answer to this question would
be no.
17. Are you in favor of the health care bill as passed? Yes No Undecided
18. Are you leftRhanded, rightRhanded, or ambidextrous?
19. Do you believe President Obama will be reelected? Yes No Uncertain
20. Change of Party Affiliation
I considered myself a Democrat or Liberal and now associate myself with the Tea
Party.
I considered myself a Republican or Conservative and now associate myself with the
Tea Party.
I considered myself other than Democrat or Republican and now associate myself
with the Tea Party.
8/3/2019 Phase 2 Content
23/24
Survey Data Analysis 23
Not applicable to me.
21. Consider Proposition 8, the 2008 proposition regarding marriage for same sex couples. The
proposition defined marriage as between a man and a woman and prohibits same sex couples
from marrying. Do you agree with Proposition 8? Yes No
22. What is your shoe size?
23. What is your ring size?
6. Hours worked per week:
7. Are you currently a Student? Yes No
8. Education Level (completed, not in progress):
(High School Grad = 12, Associate Degree = 14,
BA/BS = 16, MA/MS = 18, PhD = 20)
9. Annual Gross Income (numbers only, $39000 is 39000, not 39,000 or $39K):
10. Eye color: Brown Black Blue Hazel Each eye is a different color Other
11. Natural hair color. Black Blond Brown or Brunette Grey Red Silver
Other
12. Number of ounces of water you drank for the two days prior to submitting this survey.
One cup of water is 8 ounces whereas most glasses are 12 or 16 ounces.
13. Are you in favor of the death penalty? Yes No
In this context in favor means you are for the death penalty for at least one,
but not necessarily all, crimes that are currently punishable by death.7
14. What political party do you most closely associate yourself with? By associate, I mean
what party are you registered to vote with or if you are not registered, which party
would you register with if you had to chose. Democrat Republican Independent
Other
15. Are you registered to vote? Yes No
8/3/2019 Phase 2 Content
24/24
Survey Data Analysis 24
16. Do you personally know anyone that has been infected with the HIV virus?
Yes No
By personally know, I mean a personal knowledge of that person. As an example, we all
probably know Magic Johnson, but I doubt any of us have actually met him or are
friends with him. If Magic is the only person you know who has been infected with
HIV, then your answer to this question would be no.
17. Are you in favor of the health care bill as passed? Yes No Undecided
18. Are you leftRhanded, rightRhanded, or ambidextrous?
19. Do you believe President Obama will be reelected? Yes No Uncertain
20. Change of Party Affiliation
I considered myself a Democrat or Liberal and now associate myself with the Tea
Party.
I considered myself a Republican or Conservative and now associate myself with the
Tea Party.
I considered myself other than Democrat or Republican and now associate myself
with the Tea Party.
Not applicable to me.
21. Consider Proposition 8, the 2008 proposition regarding marriage for same sex couples. The
proposition defined marriage as between a man and a woman and prohibits same sex couples
from marrying. Do you agree with Proposition 8? Yes No