Phase 2 Content

  • Upload
    dlulza

  • View
    219

  • Download
    0

Embed Size (px)

Citation preview

  • 8/3/2019 Phase 2 Content

    1/24

    Surve Data Analsis 1

    Surve Data Analsis

    Daniel Loa

    Taft College

  • 8/3/2019 Phase 2 Content

    2/24

    Survey Data Analysis 2

    Abac

    Data was collected through a survey and fed into a database. From this database a

    sample of 50 respondents was taken and the data analysed using techniques learned in an

    introductory statistics course. This analysis was done in order to answer a series of questions

    regarding the relationship of different random variables such as gender, income, political

    association, handedness and opinion on several issues. Some results were inconclusive due to

    the size of the sample as explained in the results section.

    Inodcion

    The purpose of our research here was to better understand how conduct a scientific

    study and write a proper report. We also wanted to get some practice using the statistical

    techniques we have learned throughout this course. We accomplished this by conducting a

    study on a population constructed by collecting data from random individuals using a survey. We

    took a sample from this survey and set out to answer several questions based on the data

    collected.

    Our first question was whether or not there is an appropriate relationship between a

    persons height, weight, shoe size and ring size. It was hypothesized that such a relationship

    exists since larger people will probably have all of these be greater than a smaller person. The

    second question was if there was a difference in income based on gender for which it was

    hypothesized that such a difference would be found.

    The third part had several parts, all regarding the relationship between party association

    and three other variables. The first part was to see if there was a relationship between party

    association and the respondent believing Obama would be reelected. The second was if they

    were in favor of the health care bill as passed and the third variable was their stance on the

    death penalty. For all of these I hypothesized that a relationship existed since these issues are

  • 8/3/2019 Phase 2 Content

    3/24

    Survey Data Analysis 3

    often divided along party lines.

    The next question also had more than one part to it. It was whether there was a

    relationship between handedness and a persons stance on the death penalty, and the amount of

    water they drank. For both it was hypothesized that no relationship would be found.

    After this I wanted to see who was more likely to switch to the tea party, Democrats or

    Republicans. It was hypothesized that Republicans are more likely to switch because of the Tea

    Partys clearly conservative ideals.

    The last question that will be answered from the data collected was whether students

    were less likely to work more than 30 hours a week than non-students. The hypothesis for this

    was that students were less likely to work more than 30 hours a week.

    How the data to answer these questions, and how the data was analyzed to answer

    them, is explained in the following sections.

    Mehod

    Paicipan

    The people that took the survey were individuals randomly selected by students in

    several different statistics classes taught in Bakersfield, CA and Taft, CA. It is likely a majority of

    the individuals that took the survey reside this this general area.

    Maeial

    Survey: This was a survey that consisted of 23 simple questions the participants were

    asked to answer. A copy of this survey can be found in the appendix.

    Database: We used an online database to store the data we collected.

    Pocede

  • 8/3/2019 Phase 2 Content

    4/24

    Se Daa Aai 4

    Fi ff, eee i he ca eed 10 ad ee. The cd gie he e

    ae he aed. Oce each de had ceced hei 10 ae he added he a

    daabae ha eea ge i haig daa f 2,628 idiida. F hi daa a ad

    1-i- ae f =50 a ae. I hi cae e eded ih =52 ad g 16 a ad

    iege.

    We he ceeded aae hi daa i de e hhee. The aai f

    hi ae daa ca be fd i he fig eci.

    Rel

    Each f he ei i addeed be ih he daa ed e i ad a -ae.

    The hhei e ad ai ca be fd i he aedi. A -ae ee caed

    agai a aha ae f 0.05.

    I ediced ha hee d be a eaihi beee a e' heigh, eigh, ig ie

    ad he ie. I de ge he gah I ed he a fci ad a ice f he

    e.T ge he ae aiic, I eeed he daa i a TI-84 gahig caca ad ed

    he 1-a Sa cad. The e ae ied be. I i h ig ha f hee

    aiabe aea be diibed a hie he ae eed e hae a hae a ige

    de. A f eig f a eaihi beee a he aiabe I had idea ha e e.

    Uig a ceai eeed iadeae becae hee ee 4 aiabe iead f 2.

  • 8/3/2019 Phase 2 Content

    5/24

    Se Daa Anali 5

    Heigh Hiogam and mma aiic

    Mean S Min Q1 M Q3 Ma

    66.7800 4.0972 53.0000 64.0000 67.0000 70.0000 74.0000

    To anale he diibion of eigh in he ample I ed he ame pocede a he

    peio aiable.

  • 8/3/2019 Phase 2 Content

    6/24

    Surve Data Analsis 6

    Weight Histogram and summar statistics

    Mean S Min Q1 M Q3 Ma

    167.9400 40.9775 85 140 169 187 290

  • 8/3/2019 Phase 2 Content

    7/24

    Se Daa Anali 7

    Ring ie hiogam and mma aiic

    Mean S Min Q1 M Q3 Ma

    7.4167 1.7092 3 6 8 9 10

    The hoe ie daa a alo analed in he ame manne.

  • 8/3/2019 Phase 2 Content

    8/24

    Se Daa Aai 8

    She ie higa ad a aiic

    Mea S Mi Q1 M Q3 Ma

    8.8000 2.0677 3 7.5 9 11 12.5

    The ecd ei I aed ae ih hi d a, i hee a diffeece i

    g ice baed gede? M hhei a e. T bee dead he daa I de a

    higa ad cacaed he ae aiic f he ice f each gede. The I efed

    a Wice Ra- ha eed i -ae=0.7604 hich de he hhei.

  • 8/3/2019 Phase 2 Content

    9/24

    Surve Data Analsis 9

    Male income histogram and summar statistics

    Mean S Min Q1 M Q3 Ma

    53068.1818

    688832.8368

    0 13000 30000 69000 300000

  • 8/3/2019 Phase 2 Content

    10/24

    Survey Data Analysis 10

    Female income histogram and summary statistics

    Mean S Min Q1 M Q3 Max

    32259.4286 29747.421

    4

    0 4000 30000 48500 110000

    Our next question was to see if there was a relationship between political party and a

    respondents opinion on a series of issues. The data is summarized in a series of tables bellow

    comparing the respondents party and their response to an issue. Reporting the data in this

    manner makes it easier to perform tests of independence.

    The first part was to see if there was a relationship between party and whether or not the

    respondent believed Obama would be reelected. The hypothesis for this was that the two

    variables were dependent, however, the results were inconclusive when the Independent and

  • 8/3/2019 Phase 2 Content

    11/24

    Survey Data Analysis 11

    Other columns were included, even if they were combined. Further collapsing of the table was

    not possible.

    Obama reelection

    Republican Democrat Independent Other

    Yes 4 15 0 4

    No 15 7 3 2

    The next part was to see if there was also a relationship between party and the

    respondents approval of the health care bill. The hypothesis was that a relationship existed. In

    order to meet the assumptions for the test we collapsed the Independent and Other columns

    together. This returned a X2 of 10.9271 and a p-value=0.0042 which supports the hypothesis.

    health care Republican Democrat Other

    Yes 5 13 0

    No 14 9 9

    The same was done about the question regarding party and stance on the death penalty.

    The Independent and Other columns were combined and a test of independence was

    performed. This produced a X2 of 0.1062 and a p-value=0.9483 which contradicts the

    hypothesis.

    death Republican Democrat Other

    Yes 13 14 6

    No 6 8 3

  • 8/3/2019 Phase 2 Content

    12/24

    Survey Data Analysis 12

    I then wanted to see if a persons handedness had an effect on their opinion about the

    death penalty and the amount of water they drank. I predicted that a persons handedness should

    have no effect on either of these. To check the first part of this I collected the data on a table in

    order to perform a test of independence. This resulted in X2=2.6364 and p-value=0.2676 which

    supports my hypothesis, however, these values had to be disregarded because the

    assumptions of the test were not met. Our results are inconclusive.

    Right Left Ambidextrous

    Yes 18 4 1

    No 25 1 1

    For the next question, I drew histograms and calculated the summary statistics for the

    water drank by people of each handedness. The question was whether there was a relationship

    between handedness and water drank. The hypothesis was that there was no difference

    between handedness. The ambidextrous people were not considered because there were only 2

    in the sample, which did not represent them properly. A rank-sum test resulted in a p-value of

    0.5375 which supported the hypothesis.

  • 8/3/2019 Phase 2 Content

    13/24

    Surve Data Analsis 13

    Water drank b left-handed people histogram and summar statistics

    Mean S Min Q1

    M Q3

    Ma

    55.4000 39.1127 8 16 60 92.5 100

  • 8/3/2019 Phase 2 Content

    14/24

    Survey Data Analysis 14

    Water drank by right-handed people histogram and summary statistics

    Mean S Min Q1 M Q3 Max

    76.9767 60.5941 1.0000 32.0000 60.0000 120.0000 256.0000

    The next question was, what party were people in the Tea Party most likely to come

    from. My hypothesis was that most of the members of the Tea Party would come from the

    Republican party. The former party membership is show in the bar graph bellow.

    This question was slightly tricky. I wanted to perform a goodness of fit test but for that I

    needed to find the true proportions, which I didnt have. So I decided to use the proportion of each

    partys membership as a replacement since it seems reasonable to test against these values.

    The Independent and Other category were grouped together and the test was performed

    resulting in a p-value=0.8453 that had no meaning due to a violation of assumptions. This test

    was inconclusive.

  • 8/3/2019 Phase 2 Content

    15/24

    Surve Data Analsis 15

    Democrats Republicans Other

    members 22 19 9

    Proportion of sample 44% 38% 18%

    Former part association of Tea Part member

    Our last question was whether students are less likel to work 30 hours or more than

    non-students. I predicted that students would me more likel to work less than 30 hours a week.

    This data was then put into a table and a test of independence was performed. The X 2 was

    13.1335 and the p-value was 0.0003. This supports the hpothesis.

  • 8/3/2019 Phase 2 Content

    16/24

    S D A 16

    L 30 30

    S 18 8

    N- 6 23

    Discssion

    O . W ,

    .

    T , I

    . T I .

    T . I -

    - 0.7604,

    .

    T I .

    T O

    . T I

    I . T

    . T

    . T

    . M

    .

    A .

    A, . F

  • 8/3/2019 Phase 2 Content

    17/24

    Survey Data Analysis 17

    of this, however, a rank-sum test proved my hypothesis that handedness has nothing to do with

    how much water somebody drinks.

    The results of the following question should be taken lightly since estimates were used in

    liue of true proportions. The question was whether individuals from one party were more likely to

    switch to the tea party. My hypothesis that members of one party were more likely to join the tea

    party could not be proved or discredited because the assumptions for the test failed so the

    p-value was practically useless.

    The last question was answered using a test of independence that showed being a

    student and working more that 30 hours a week were linked.

    Shortly after beginning the analysis it became clear that 50 was not a big enough sample

    size to reliably answer all of the questions proposed in the introduction. This sample size was

    selected because it appeared to be a manageable size. However, it turned out to be too small to

    the point that some statistical tests could not be performed. A better sample size would have

    been 75 or 100. This would have substantially increased the effort needed to analyse the data

    but at the same time produced better results.

    The main purpose of this project was to be a learning experience and provide a

    meaningful way to practice the techniques learned in the course.

    Appendi

    The raw data is available upon request. Here are the assumptions and work for each test

    performed on the data. All of the following p-values were compared against alpha=0.05.

    2: For the second question I performed a t-test with

    Ho:M=F HA:M>F Alpha= 0.05

    Then we begin to check the assumptions which are violated by normal plots. This causes us to

    switch to a non-parametric test, the wilcoxon rank-sum. This results in a p-value of 0.7604.

  • 8/3/2019 Phase 2 Content

    18/24

    Survey Data Analysis 18

    Ho:M=F HA:M>F Alpha= 0.05

    There was not enough evidence to suggest median female income was less than median male

    income.

    Female Income Male Income

    3a: For the next section I did a test of independence for each of the tables. The first one, which

    was between party and opinion about Obamas reelection.

    HA: Party and belief that Obama will be reelected are independent.

    HO: Party and belief that Obama will be reelected are dependent.

    This first test of independence returned X2 =12.7055 and p-value=0.0053. However the

    assumptions were violated because half of the expected values were bellow 5.

    8.74 10.12 1.38 2.76

    10.26 11.88 1.62 3.24

    Even when the table was collapsed to have the Other and Independent columns together the

    assumptions were still violated by the expected values since a third were bellow 5. This

  • 8/3/2019 Phase 2 Content

    19/24

    Se Daa Aai 19

    eeed a cci f beig eached.

    8.74 10.12 4.14

    10.26 11.88 4.86

    3b: The e e a beee a ad aa f he heah cae bi.

    HA: Pa ad aa f he heah cae bi ae ideede.

    HO: Pa ad aa f he heah cae bi ae deede.

    The eeced ae ae ied be ad ee ai ice a ih ae be 5.

    6.84 7.92 3.24

    12.16 14.08 5.76

    3c: The e efed he ae e he a abe.

    HA: Pa ad ace he deah ea ae ideede.

    HO: Pa ad ace he deah ea ae deede.

    The eeced ae ffi he ai -ae f 0.9483 i aid. Thee i egh

    eidece gge a ad ace he deah ea ae deede.

    12.54 14.52 5.94

    6.46 7.48 3.06

    4a: F he e ei, he fi a ied a e f ideedece beee hadede ad

    ii he deah ea.

    HA: Hadede ad ace he deah ea ae ideede.

    HO: Hadede ad ace he deah ea ae deede.

  • 8/3/2019 Phase 2 Content

    20/24

    Survey Data Analysis 20

    The expected values clearly show that the assumptions are violated so the p-value has no

    meaning. This test was inconclusive.

    19.78 2.3 0.92

    23.22 2.7 1.08

    4b: The next part was finding a relationship between handedness and the amount of water drank.

    I decided to only compare left and right handed people because there were only two

    ambidextrous people in the sample. I was unable to perform a t-test because the normality

    assumption was violated by normal plots.

    Ho:R=L HA:R=/=L Alpha= 0.05

    Left handed Right handed

    This forced me to switch to a rank-sum test which required no assumptions and returned a

    p-value of 0.5375. This meant the null hypothesis was not rejected. There is enough evidence to

    suggest right handed people drink different amounts of water than left handed people.

    Ho:R=L HA:R=/=L Alpha= 0.05

    6: For the sixth and final question I performed a test of independence.

    HA: Being a student and working more than 30 hours are independent.

  • 8/3/2019 Phase 2 Content

    21/24

    Survey Data Analysis 21

    HO: Being a student and working more than 30 hours are dependent.

    The test resulted in X2 =13.1335 and the p-value was 0.0003 and the assumptions are met by

    the expected values. There is not enough evidence to suggest being a student and working are

    independent.

    11.245 14.655

    12.655 16.345

    Questions contained in the survey

    1. Gender: Male Female

    2. Ethnicity: White Black Hispanic Other

    3. Age (years):

    4. Height (in inches, so 5 ft 7 inches would be 67):

    5. Weight (pounds):

    6. Hours worked per week:

    7. Are you currently a Student? Yes No

    8. Education Level (completed, not in progress):

    (High School Grad = 12, Associate Degree = 14,

    BA/BS = 16, MA/MS = 18, PhD = 20)

    9. Annual Gross Income (numbers only, $39000 is 39000, not 39,000 or $39K):

    10. Eye color: Brown Black Blue Hazel Each eye is a different color Other

    11. Natural hair color. Black Blond Brown or Brunette Grey Red Silver

    Other

    12. Number of ounces of water you drank for the two days prior to submitting this survey.

    One cup of water is 8 ounces whereas most glasses are 12 or 16 ounces.

  • 8/3/2019 Phase 2 Content

    22/24

    Survey Data Analysis 22

    13. Are you in favor of the death penalty? Yes No

    In this context in favor means you are for the death penalty for at least one,

    but not necessarily all, crimes that are currently punishable by death.7

    14. What political party do you most closely associate yourself with? By associate, I mean

    what party are you registered to vote with or if you are not registered, which party

    would you register with if you had to chose. Democrat Republican Independent

    Other

    15. Are you registered to vote? Yes No

    16. Do you personally know anyone that has been infected with the HIV virus?

    Yes No

    By personally know, I mean a personal knowledge of that person. As an example, we all

    probably know Magic Johnson, but I doubt any of us have actually met him or are

    friends with him. If Magic is the only person you know who has been infected with

    HIV, then your answer to this question would be no.HIV, then your answer to this question would

    be no.

    17. Are you in favor of the health care bill as passed? Yes No Undecided

    18. Are you leftRhanded, rightRhanded, or ambidextrous?

    19. Do you believe President Obama will be reelected? Yes No Uncertain

    20. Change of Party Affiliation

    I considered myself a Democrat or Liberal and now associate myself with the Tea

    Party.

    I considered myself a Republican or Conservative and now associate myself with the

    Tea Party.

    I considered myself other than Democrat or Republican and now associate myself

    with the Tea Party.

  • 8/3/2019 Phase 2 Content

    23/24

    Survey Data Analysis 23

    Not applicable to me.

    21. Consider Proposition 8, the 2008 proposition regarding marriage for same sex couples. The

    proposition defined marriage as between a man and a woman and prohibits same sex couples

    from marrying. Do you agree with Proposition 8? Yes No

    22. What is your shoe size?

    23. What is your ring size?

    6. Hours worked per week:

    7. Are you currently a Student? Yes No

    8. Education Level (completed, not in progress):

    (High School Grad = 12, Associate Degree = 14,

    BA/BS = 16, MA/MS = 18, PhD = 20)

    9. Annual Gross Income (numbers only, $39000 is 39000, not 39,000 or $39K):

    10. Eye color: Brown Black Blue Hazel Each eye is a different color Other

    11. Natural hair color. Black Blond Brown or Brunette Grey Red Silver

    Other

    12. Number of ounces of water you drank for the two days prior to submitting this survey.

    One cup of water is 8 ounces whereas most glasses are 12 or 16 ounces.

    13. Are you in favor of the death penalty? Yes No

    In this context in favor means you are for the death penalty for at least one,

    but not necessarily all, crimes that are currently punishable by death.7

    14. What political party do you most closely associate yourself with? By associate, I mean

    what party are you registered to vote with or if you are not registered, which party

    would you register with if you had to chose. Democrat Republican Independent

    Other

    15. Are you registered to vote? Yes No

  • 8/3/2019 Phase 2 Content

    24/24

    Survey Data Analysis 24

    16. Do you personally know anyone that has been infected with the HIV virus?

    Yes No

    By personally know, I mean a personal knowledge of that person. As an example, we all

    probably know Magic Johnson, but I doubt any of us have actually met him or are

    friends with him. If Magic is the only person you know who has been infected with

    HIV, then your answer to this question would be no.

    17. Are you in favor of the health care bill as passed? Yes No Undecided

    18. Are you leftRhanded, rightRhanded, or ambidextrous?

    19. Do you believe President Obama will be reelected? Yes No Uncertain

    20. Change of Party Affiliation

    I considered myself a Democrat or Liberal and now associate myself with the Tea

    Party.

    I considered myself a Republican or Conservative and now associate myself with the

    Tea Party.

    I considered myself other than Democrat or Republican and now associate myself

    with the Tea Party.

    Not applicable to me.

    21. Consider Proposition 8, the 2008 proposition regarding marriage for same sex couples. The

    proposition defined marriage as between a man and a woman and prohibits same sex couples

    from marrying. Do you agree with Proposition 8? Yes No