Eng ISM Chapter 9 Statistics For Engineering and Sciences by Mandenhall

Embed Size (px)

Citation preview

  • 8/13/2019 Eng ISM Chapter 9 Statistics For Engineering and Sciences by Mandenhall

    1/25

    190 Chapter 9

    CHAPTER 9

    ..

    Categorical Data Analysis

    9.2 a. The one-way table is shown below:

    b. The form of the confidence interval is:

    1 11 / 2

    (1 )

    p pp z

    n

    1

    294 .3025

    972p = =

    For confidence coefficient .95, = 1 .95 = .05 and /2 = .05/2 = .025. From Table 5,Appendix B, .025z = 1.96. The confidence interval is:

    .3025(1 .3025).3025 1.96 .3025 .0289 (.2736, .3314)

    972

    c. The form of the confidence interval is:

    ( ) 1 1 2 2 1 21 2 / 2 (1 ) (1 ) 2

    p p p p p p

    p p zn

    + +

    1

    357 .3673

    972p = = 2

    321 .3302

    972p = =

    The confidence interval is:

    .3673(1 .3673) .3302(1 .3302) 2(.3673)(.3302)(.3673 .3302) 1.96

    972

    + +

    .0371 .0524 ( .0153, .0895)

    We are 95% confident the difference in the proportion of cars turning left and right is

    contained between .0153 and .0895.

    9.4 a. Letp1= the proportion of WMU students who agree that their DSIP research

    experience is valuable to their professional future.

    94.50

    47

    1 ==p

    Turned Left Turned Right Drove Straight Total

    357 321 294 972

  • 8/13/2019 Eng ISM Chapter 9 Statistics For Engineering and Sciences by Mandenhall

    2/25

    Categorical Data Analysis 191

    The confidence interval forp1isn

    qpzp 1121

    For confidence coefficient .99, = 1 .99 = .01 and /2 = .01/2 = .005. From Table 5in Appendix B,z.005= 2.576. The 99% confidence interval forp1is:

    .94(.06).94 2.576 .94 .087 (.853,1.027)

    50

    b. Letp1= the proportion of WMU students who agree that their DSIP research

    experience is valuable to their professional future and letp2= the proportion of WMU

    students who are neutral about the statement.

    94.50

    47

    1 ==p and 06.50

    3

    2 ==p

    The confidence interval forp1p2 is:

    n

    ppppppzpp 212211221

    2)1()1()(

    ++

    For confidence coefficient .99, = 1 .99 = .01 and /2 = .01/2 = .005. From Table 5in Appendix B,z.005= 2.576. The 99% confidence interval forp1is:

    .94(.06) .06(.94) 2(.94)(.06)(.94 .06) 2.576 .88 .173 (.707,1.053)

    50

    + +

    9.6 The form of the interval is:

    1 1 2 2 1 21 2 2

    (1 ) (1 ) 2

    p p p p p pp p z

    n

    + +

    1

    58 .6444

    90p = = 2

    15 .1667

    90p = =

    For confidence coefficient .95, = .1 .95 = .05 and /2 = .05/2 = .025. From Table 5,

    Appendix B, .025z = 1.96. The 95% confidence interval is:

    .6444(.3556) .1667(.8333) 2(.6444)(.1667)(.6444 .1667) 1.96

    90

    + +

    .4777 .1577 (.3200, .6354)

    We are 95% confident the difference between the proportion of subjects who selected

    brighter side up and the proportion who select darker side up falls in the interval .3200 to

    .6354.

  • 8/13/2019 Eng ISM Chapter 9 Statistics For Engineering and Sciences by Mandenhall

    3/25

    192 Chapter 9

    9.8 a. The form of the confidence interval for Cp is:

    ( )C CC / 2

    1

    p pp z

    n

    CC

    22 .22100

    npn

    = = =

    For confidence coefficient .90, = 1 .90 = .10 and /2 = .10/2 = .05. From Table 5,

    Appendix B, .05 1.645z = . The confidence interval is:

    .22(1 .22).22 1.645 .22 .068 (.152, .288)

    100

    b. The form of the confidence interval for ( )E Bp is:

    ( ) ( ) ( )E E B B E B

    E B / 2

    1 1 2

    p p p p p pp p z

    n

    + +

    EE

    19 .19

    100

    np

    n= = =

    BB

    27 .27

    100

    np

    n= = =

    Using the information from part a, the confidence interval is:

    .19(1 .19) .27(1 .27) 2(.19)(.27)(.19 .27) 1.645

    100

    + +

    .08 .111 ( .191, .031)

    c. AA17

    .17100

    np

    n= = =

    DD

    15 .15

    100

    np

    n= = =

    Using the information from part b, the confidence interval is:

    .17(1 .17) .15(1 .15) 2(.17)(.15)(.17 .15) 1.645

    100

    + +

    .02 .093 ( .073, .095)

  • 8/13/2019 Eng ISM Chapter 9 Statistics For Engineering and Sciences by Mandenhall

    4/25

    Categorical Data Analysis 193

    9.10 a. To determine if the opinions of Internet users are evenly divided among the four

    categories, we test:

    0 1 2 3 4

    a

    : .25

    : At least two of the proportions differ

    H p p p p

    H

    = = = =

    b. The expected numbers in each category are:

    E(ni)= npi= 328(.25) = 82

    The test statistic is:

    [ ]2 2 2 2 2

    2 ( ) (59 82) (108 82) (82 82) (79 82) 14.805

    ( ) 82 82 82 82

    i i

    i

    n E n

    E n

    = = + + + =

    The rejection region requires = .05 in the upper tail of the 2 distribution with df = k

    1 = 4 1 = 3. From Table 8 in Appendix B, 2.05 = 7.81473. The rejection region is2

    > 7.81473.

    Since the observed value of the test statistic does fall in the rejection region ( 2 =

    14.805 > 7.81473),H0is rejected. There is sufficient evidence to indicate that the

    opinions of Internet users are not evenly divided among the four categories.

    c. A Type I error would occur if we conclude that differences exist when, in fact, they do

    not.

    A Type II error would occur if we conclude that no differences exist when, in fact, they

    do.

    d. The expected cell counts must all be at least five and the multinomial assumptions must

    be met.

    9.12 To determine if there are significant differences in the percentage of incidents in the four

    cause categories, we test:

    0 1 2 3 4

    a

    : .25

    : At least two of the proportions differ

    H p p p p

    H

    = = = =

    The expected numbers in each category are:

    E(ni)= npi= 83(.25) = 20.75

  • 8/13/2019 Eng ISM Chapter 9 Statistics For Engineering and Sciences by Mandenhall

    5/25

    194 Chapter 9

    The test statistic is:

    [ ]2 2 2 2 2

    2 ( ) (27 20.75) (24 20.75) (22 20.75) (10 20.75) 8.04

    ( ) 20.75 20.75 20.75 20.75

    i i

    i

    n E n

    E n

    = = + + + =

    The rejection region requires = .05 in the upper tail of the 2 distribution with df = k 1 =

    4 1 = 3. From Table 8 in Appendix B, 2.05 = 7.81473. The rejection region is2

    >

    7.81473.

    Since the observed value of the test statistic does fall in the rejection region 2 = 8.04 >

    7.81473),H0 is rejected. There is sufficient evidence to indicate that there are significant

    differences in the percentage of incidents in the four cause categories.

    9.14 a. To determine if the traffic is equally divided among the three directions, we test:

    0 1 2 3: 1/3H p p p= = =

    a: At least two proportions are unequalH

    The expected number in each category is:

    E(ni)= npi=1

    9723

    = 324 (i= 1, 2, 3)

    The observed and expected category counts are:

    Straight Turn Right Turn Left

    Observed 294 321 357

    Expected 324 324 324

    The test statistic is:

    ( )2 2 2 2

    2 (294 324) (321 324) (357 324) 6.167

    324 324 324

    i i

    i

    n np

    np

    = = + + =

    The rejection region requires = .05 in the upper tail of the2

    distribution with df = k

    1 = 3 1 = 2. From Table 8, Appendix B, 2.05 = 5.99147. The rejection region is

    2 > 5.99147.

    Since the observed value of the test statistic falls in the rejection region ( 2 = 6.167 >

    5.99147), 0H is rejected. There is sufficient evidence to indicate the traffic is not

    equally divided at = .05.

  • 8/13/2019 Eng ISM Chapter 9 Statistics For Engineering and Sciences by Mandenhall

    6/25

    Categorical Data Analysis 195

    b. To determine if more than one-third of all automobiles entering the intersection turn

    left, we test:

    0: 1/3H p=

    a: 1/3H p>

    The rejection region for this large-sample, one-tailed test requires = .05 in the upper

    tail of thezdistribution. From Table 5, Appendix B, .05z = 1.645. The rejection region

    isz> 1.645.

    The test statistics is 0

    0 0

    357 1 972 3 2.25

    1 2

    3 3972

    p pz

    p q

    n

    = = =

    i

    Since the observed value of the test statistic falls in the rejection region (z = 2.25 >

    1.645), 0H is rejected. There is sufficient evidence to indicate the proportion of all

    automobiles entering this intersection that turn left exceeds 1/3 using = .05.

    9.16 To determine if three proportions differ, we test:

    0 1 2 3: 1/3H p p p= = =

    a: At least two of the proportions differH

    The expected cell counts are:

    E(ni) = npi=

    1

    90 3

    = 30 (i= 1, 2, 3)

    The observed and expected category counts are:

    Brighter Side Up Darker Side Up Aligned

    Observed 58 15 17

    Expected 30 30 30

    The test statistic is:

    ( )2 2 2 2

    2 (58 30) (15 30) (17 30) 39.267

    30 30 30

    i i

    i

    n np

    np

    = = + + =

    The rejection region requires = .05 in the upper tail of the 2 distribution with df = k1 =

    3 1 = 2. From Table 8, Appendix B,2.05 = 5.99147. The rejection region is

    2 > 5.99147.

    Since the observed value of the test statistic falls in the rejection region ( 2 = 39.267 >

    5.99147), 0H is rejected. There is sufficient evidence to indicate at least two of the

    proportions differ at = .05.

  • 8/13/2019 Eng ISM Chapter 9 Statistics For Engineering and Sciences by Mandenhall

    7/25

    196 Chapter 9

    9.18 For k= 2:

    ( ) ( ) ( )2 2 22

    2 1 1 2 2

    1 21

    i i

    ii

    n np n np n np

    np np np=

    = = +

    For a binomial experiment, 1 2 1 2, , , and (1 )n y n n y p p p p= = = =

    [ ]22

    2 ( ) (1 )( )

    (1 )

    n y n py np

    np n p

    = +

    =2 2 2 2 2 22 ( ) 2 ( )(1 ) (1 )

    (1 )

    y ynp n p n y n n y p n p

    np n p

    + + +

    =2 2 2 2 2 2 2 2 2 2

    22 2 2 2 2 2 2

    (1 )

    y ynp n p n ny y n n p ny npy n n p n p

    np n p

    + + + + + ++

    =2 2 2 2 2 2 2 2 2 2 2 2 32(1 ) 2 (1 ) (1 ) 2 2 2 2 2 2

    (1 )

    y p ynp p n p p n p nyp y p n p n p nyp nyp n p n p n p

    np p

    + + + + + + +

    =2 2 2 2 2 2 3 2 2 2 2 2 2 2 32 2 2 2 2

    (1 )

    y y p ynp ynp n p n p y p n p nyp n p n p

    np p

    + + + + +

    =2 2 2 2 2

    22 ( ) ( )

    (1 ) (1 )

    y nyp n p y np y npz

    np p np p npq

    + = = =

    9.20 a. Yes, the sampling appears to satisfy the assumptions of a multinomial experiment. Theexperiment contains 120 trials and 2(4) = 8 categories. Since the 120 rats were

    randomly selected, the trials are considered independent and the probabilities are

    considered constant.

    b. ( ). .

    i jij

    n nE n

    n=

    ( )1180(30) 20

    120E n = = ( )21

    40(30) 10120

    E n = =

    ( )12 80(30) 20120E n = = ( )22 40(30) 10120E n = =

    ( )1380(30) 20

    120E n = = ( )23

    40(30) 10120

    E n = =

    ( )1480(30) 20

    120E n = = ( )24

    40(30) 10120

    E n = =

  • 8/13/2019 Eng ISM Chapter 9 Statistics For Engineering and Sciences by Mandenhall

    8/25

    Categorical Data Analysis 197

    c.( )

    ( )

    2

    2

    ij ij

    ij

    n E n

    E n

    =

    =2 2 2 2 2

    (27 20) (20 20) (19 20) (14 20) (3 10)20 20 20 20 10 + + + +

    +2 2 2(10 10) (11 10) (16 10)

    12.910 10 10

    + + =

    d. To determine if diet and presence/absence of cancer are independent, we test:

    0: Diet and presence/absence of cancer are independentH

    a: Diet and presence/absence of cancer are dependentH

    The test statistic is 2 = 12.9.

    The rejection region requires = .05 in the upper tail of the 2 distribution with df =

    (r1)(c1) = (2 1)(4 1) = 3. From Table 8, Appendix B, 2.05 = 5.99147. The

    rejection region is 2 > 5.99147.

    Since the observed value of the test statistic falls in the rejection region ( 2 = 12.9 >

    5.99147), 0H is rejected. There is sufficient evidence to indicate that diet and

    presence/absence of cancer are not independent at = .05.

    e. Let 1 = proportion of rats on high fat/no fiber diet with cancer and let 2 = proportion

    of rats on high fat/fiber diet with cancer.

    1

    27

    30p = = .9 2

    20

    30p = = .667

    The confidence interval for the difference between two proportions is:

    ( ) 1 1 2 21 2 21 2

    p q p qp p z

    n n

    +

    For confidence coefficient .95, = 1 .95 = .05 and /2 = .05/2 = .025. From Table 5,

    Appendix B, .025z = 1.96. The 95% confidence interval is:

    .9(.1) .667(.333)(.90 .667) 1.645 .233 .2 (.033, .433)

    30 30 +

  • 8/13/2019 Eng ISM Chapter 9 Statistics For Engineering and Sciences by Mandenhall

    9/25

    198 Chapter 9

    To obtain the confidence interval for the percentage, multiply the endpoints by 100%.

    The interval is (3.3%, 43.3%).

    We are 95% confident that the difference in the percentage of rats with cancer betweenthose on high fat/no fiber diets and those on high fat/fiber diets is between 3.3% and43.3%.

    Since the rats were divided into groups according to diets, we assume the groups areindependent.

    9.22 Using MINITAB, the results of the analyses are:

    Tabulated statistics: Stops, Kills

    Using frequencies in Fr

    Rows: Stops Columns: Kills

    1 2 3 4 5 All

    1 32 33 19 5 2 9128.31 34.88 18.71 6.57 2.53 91.00

    2 24 36 18 8 3 8927.69 34.12 18.29 6.43 2.47 89.00

    All 56 69 37 13 5 18056.00 69.00 37.00 13.00 5.00 180.00

    Cell Contents: CountExpected count

    Pearson Chi-Square = 2.171, DF = 4, P-Value = 0.704

    Likelihood Ratio Chi-Square = 2.182, DF = 4, P-Value = 0.702

    * NOTE * 2 cells with expected counts less than 5

    First, we check to see if the assumption about the expected cells is met. From the table, there

    are two expected cell counts that are less than 5. Thus, the results of the test are suspect.

    To determine if the number of kills is related to whether the trial was stopped or not, we test:

    H0: Number of kills and whether the trial was stopped or not are independentHa: Number of kills and whether the trial was stopped or not are dependent

    The test statistic is

    2

    = 2.171 (from the printout).

    Thep-value of the test is .704. Since thisp-value is so large,H0is not rejected. There is

    insufficient evidence to indicate that the number of kills is related to whether the trial was

    stopped or not at .10.

  • 8/13/2019 Eng ISM Chapter 9 Statistics For Engineering and Sciences by Mandenhall

    10/25

    Categorical Data Analysis 199

    9.24 a. The contingency table is shown below:

    b. To determine if flight response of the geese depends on altitude of the helicopter, wetest:

    H0: Flight response and Altitude are independentHa: Flight response and Altitude are dependent

    Statistix was used to create the following printout:

    Chi-Square Test for Heterogeneity or Independence

    for Count = Altitude Response

    ResponseAltitude Low High

    +-----------+-----------+1 Observed | 85 | 105 | 190

    Expected | 73.30 | 116.70 |Cell Chi-Sq | 1.87 | 1.17 |

    +-----------+-----------+2 Observed | 77 | 121 | 198

    Expected | 76.38 | 121.62 |Cell Chi-Sq | 0.00 | 0.00 |

    +-----------+-----------+3 Observed | 17 | 59 | 76

    Expected | 29.32 | 46.68 |

    Cell Chi-Sq | 5.18 | 3.25 |+-----------+-----------+179 285 464

    Overall Chi-Square 11.48P-Value 0.0032Degrees of Freedom 2

    Since = .01 >p-value = .0032,H0can be rejected. There is sufficient evidence toindicate that flight response of the geese depends on the altitude of the helicopter.

    c. The contingency table is shown below:

    High Low Total

    Less than 300 105 85 190

    300-600 meters 121 77 198

    600 or more 59 17 76

    Total 285 179 464

    High Low Total

    Less than 1,000 243 37 280

    1,000-2,000 meters 37 68 105

    2,000-3,000 meters 4 44 48

    3,00 or more 1 30 31

    Total 285 179 464

  • 8/13/2019 Eng ISM Chapter 9 Statistics For Engineering and Sciences by Mandenhall

    11/25

    200 Chapter 9

    d. To determine if flight response of the geese depends on lateral distance of the

    helicopter, we test:

    H0: Flight response and Lateral distance are independentHa: Flight response and Lateral distance are dependent

    Statistix was used to create the following printout:

    Chi-Square Test for Heterogeneity or Independencefor Count = Lat_Cat Response

    ResponseLat_Cat Low High

    +-----------+-----------+1 Observed | 37 | 243 | 280

    Expected | 108.02 | 171.98 |Cell Chi-Sq | 46.69 | 29.33 |

    +-----------+-----------+2 Observed | 68 | 37 | 105

    Expected | 40.51 | 64.49 |

    Cell Chi-Sq | 18.66 | 11.72 |+-----------+-----------+3 Observed | 44 | 4 | 48

    Expected | 18.52 | 29.48 |Cell Chi-Sq | 35.07 | 22.03 |

    +-----------+-----------+4 Observed | 30 | 1 | 31

    Expected | 11.96 | 19.04 |Cell Chi-Sq | 27.22 | 17.09 |

    +-----------+-----------+179 285 464

    Overall Chi-Square 207.80P-Value 0.0000Degrees of Freedom 3

    Since = .01 >p-value = .0000,H0can be rejected. There is sufficient evidence toindicate that flight response of the geese depends on the lateral distance of the

    helicopter.

    9.26 a. To find the proportion of censored measurements for each of the six tractor lines, wetake the number of censored measurements for each tractor line and divide it by thetotal number of measurements for each tractor lane.

    1

    175 0.028

    6047p = =

    2236 0.050

    4692p = =

    3

    319 0.045

    7140p = =

    4

    231 0.038

    6120p = =

  • 8/13/2019 Eng ISM Chapter 9 Statistics For Engineering and Sciences by Mandenhall

    12/25

    Categorical Data Analysis 201

    5

    480 0.046

    10353p = =

    6

    187 0.039

    4794p = =

    b. Statistix was used to create the following printout:

    Chi-Square Test for Heterogeneity or Independencefor Count = Lat_Cat Response

    ResponseTractor Line Uncensored Censored

    +-----------+-----------+1 Observed | 175 | 6047 | 6222

    Expected | 257.61 | 5964.39 |Cell Chi-Sq | 26.49 | 1.14 |

    +-----------+-----------+2 Observed | 236 | 4456 | 4692

    Expected | 194.26 | 4497.74 |

    Cell Chi-Sq | 8.97 | 0.39 |+-----------+-----------+

    3 Observed | 319 | 6821 | 7140Expected | 295.62 | 6844.38 |

    Cell Chi-Sq | 1.85 | 0.08 |+-----------+-----------+

    4 Observed | 231 | 5889 | 6120Expected | 253.39 | 5866.61 |

    Cell Chi-Sq | 1.98 | 0.09 |+-----------+-----------+

    5 Observed | 480 | 9873 | 10353Expected | 428.64 | 9924.36 |

    Cell Chi-Sq | 6.15 | 0.27 |+-----------+-----------+

    6 Observed | 187 | 4607 | 4794

    Expected | 198.49 | 4595.51 |Cell Chi-Sq | 0.66 | 0.03 |

    +-----------+-----------+1628 37693 39321

    Overall Chi-Square 48.09P-Value 0.0000Degrees of Freedom 5

    To determine the proportion of censored measurements differs for the six tractor lines,we test:

    H0: Measurement type and tractor line are independent

    Ha: Measurement type and tractor line are dependent

    Since = 01 >p-value = .0000,H0can be rejected. There is sufficient evidence toindicate that the proportion of censored measurements differs for the six tractor lines.

    c. While statistically significant, we have no way of knowing when a tractor line willproduce a large number of censored measurements and when it will produce a smallnumber of censored measurements. From a practical perspective, not much useful

    information has been learned.

  • 8/13/2019 Eng ISM Chapter 9 Statistics For Engineering and Sciences by Mandenhall

    13/25

    202 Chapter 9

    9.28 a. The contingency table is:

    Committee

    Acceptable Rejected Totals

    Acceptable 101 23 124InspectorRejected 10 19 29

    Totals 111 42 153

    b. Yes. To plot the percentages, first convert frequencies to percentages by dividing thenumbers in each column by the column total and multiplying by 100. Also, divide therow totals by the overall total and multiply by 100.

    Acceptable Rejected Totals

    Acceptable

    111

    101100 = 90.99%

    42

    23100 = 54.76%

    123

    124100 = 81.05%

    InspectorRejected

    111

    10100 = 9.01%

    42

    19100 = 45.23%

    153

    29100 = 18.95%

    From the plot, it appears there is a relationship.

    c. Some preliminary calculations are:

    11

    E = 153

    )111(12411

    =ncr

    = 89.961 12

    E = 153

    )42(12421

    =ncr

    = 34.039

    21E =

    153

    )111(2912 =n

    cr= 21.039 22

    E =153

    )42(2922 =n

    cr= 7.961

    0

    0.2

    0.4

    0.6

    0.8

    1

    Acceptable Rejected Total

    Committee

    Proportiona

    ccept/rejecte

  • 8/13/2019 Eng ISM Chapter 9 Statistics For Engineering and Sciences by Mandenhall

    14/25

    Categorical Data Analysis 203

    To determine if the inspector's classifications and the committee's classifications arerelated, we test:

    H0: The inspector's and committee's classification are independentHa: The inspector's and committee's classifications are dependent

    The test statistic is 2=2[ ]

    ij ji

    ij

    n E

    E

    =961.7

    )961.719(

    039.21

    )039.2110(

    039.34

    )039.3423(

    961.89

    )961.89101( 2222 +

    +

    +

    = 26.034

    The rejection region requires = .05 in the upper tail of the 2distribution withdf = (r1)(c1) = (2 1)(2 1) = 1. From Table 8, Appendix B, 2.05 = 3.84146.The rejection region is 2> 3.84146.

    Since the observed value of the test statistic falls in the rejection region (2= 26.034 >3.84146),H0is rejected. There is sufficient evidence to indicate the inspector's andcommittee's classifications are dependent at = .05. This indicates that the inspectorand committee tend not to make the same decisions.

    9.30 We wish to test:

    0 1 2 3 4 5 6 7: 1/7H p p p p p p p= = = = = = =

    a

    1 2 3 4 5 6 7

    : At least two of these proportions are different from

    1/7

    H

    p p p p p p p= = = = = = =

    Our statistic is( )

    272

    1

    i i

    ii

    O e

    e=

    =

    The observed counts are found by using the table information:

    iO = (number of specimens)(percentage with manganese nodules)

    The expected counts are found by i i ie n p=

    These results are summarized as follows:

    2 2 22 (23 55.6) (25 20.0) (11 14.1)

    32.5955.6 20.0 14.1

    = + + + =

    Age Observed Expected

    Miocene-recent 389(.059) = 23 389(1/7) = 55.6

    Oligocene 140(.179) = 25 140(1/7) = 20.0

    Eocene 214(.164) = 35 214(1/7) = 30.6

    Paleocene 84(.214) = 18 84(1/7) = 12.0

    Lake Cretaceous 247(.211) = 52 247(1/7) = 35.3

    Early and Middle Cretaceous 1120(.142) = 159 1120(1/7) = 160.0

    Jurassic 99(.110) = 11 99(1/7) = 14.1

    323

  • 8/13/2019 Eng ISM Chapter 9 Statistics For Engineering and Sciences by Mandenhall

    15/25

    204 Chapter 9

    The rejection region requires = .05 in the upper tail of the 2 distribution with k1 = 7 1

    = 6 df. From Table 8, Appendix B, 2.05 = 12.5916. Reject 0H if2

    > 12.5916.

    Since the observed value of the test statistic falls in the rejection region ( 2 = 32.59 >

    12.5916), 0H is rejected.

    9.32 a. To determine if the percentages of the different types of programming statements differ

    for the two languages, we test:

    0: The proportions of the different types of programming statements are the

    same for the two languages

    H

    a: The proportions of the different types of programming statements are

    different for the two languages

    H

    The expected category counts are:

    ( ). .

    i jij

    n nE n

    n=

    ( )112170(10,412) 1136.407

    19,882E n = =

    ( )122170(9470) 1033.593

    19,882E n = =

    ( )52726(9470) 345.801

    19,882E n = =

    The observed and expected category counts are:

    The test statistic is:

    ( )( )

    2

    2

    ij ij

    ij

    n E n

    E n

    =

    =2 2 2(125 1136.407) (2045 1033.593) (465 345.801)

    1136.407 1033.593 345.801

    + + +

    = 4755.1933

    ALGOL PASCAL Totals

    IF 125 (1136.407) 2,045 (1033.593) 2,170

    FOR 968 (690.223) 350 (627.777) 1,318

    IO 135 (1037.953) 1,847 (944.047) 1,982

    IF ASSIGNMENT 8,293 (7167.218) 4,763 (6518.782) 13,686

    Other 261 (380.199) 465 (345.801) 726

    Totals 10,412 9,470 19,882

  • 8/13/2019 Eng ISM Chapter 9 Statistics For Engineering and Sciences by Mandenhall

    16/25

    Categorical Data Analysis 205

    The rejection region requires = .05 in the upper tail of the 2 distribution with df = (r

    1)(c 1) = (5 1)(2 1) = 4. From Table 8, Appendix B, 2.05 = 9.48773. The

    rejection region is 2 > 9.48773.

    Since the observed value of the test statistic falls in the rejection region ( 2 =

    4755.1993 > 9.48773), 0H is rejected. There is sufficient evidence to indicate the

    percentages of the different types of programming statements differ for the two

    languages at = .05.

    b. The form of the confidence interval for ( )A Pp p is:

    ( ) A A P PA P 2A P

    (1 ) (1 )p p pp p z

    n n

    +

    AA

    A

    8923 .857

    10,412

    Xp

    n

    = = = PPP

    4763 .503

    9470

    Xp

    n

    = = =

    For confidence coefficient .95, = 1 .95 = .05 and /2 = . 05/2 = .025. From Table 5,

    Appendix B, .025 1.96z = . The confidence interval is:

    .857(1 .857) .503(1 .503)(.857 .503) 1.96 .354 .0121

    10412 9470

    +

    (.3419, .3661)

    9.34 a. The form of the contingency tables will all be:

    b. The hypergeometric formula for these tables is:

    449 49

    10

    498

    10

    y y

    , wherey= 0, 1, 2, , 10

    Predicted EVG

    No Yes TotalFALSE 439 +y 10 y 449

    DefectTRUE 49 y y 49

    Total 488 10 498

  • 8/13/2019 Eng ISM Chapter 9 Statistics For Engineering and Sciences by Mandenhall

    17/25

    206 Chapter 9

    Due to the large sample size, these factorials produce difficult probabilities to calculate.

    The resulting probabilities are shown below:

    c. The Fishers exact testp-value can be found by adding the probabilities at least as

    contradictory as the one observed. P-value =P(y= 2 or 3 or or 10) = 0.2572.

    d. We see that these two probabilities are equal.

    9.36 a. The form of the confidence interval is:

    2

    (1 ) i i

    i

    p pp z

    n

    1 2 3 .60, .23, .17p p p= = =

    For confidence coefficient .95, = 1 .95 = .05 and /2 = .05/2 = .025. From Table 5,

    Appendix B, .025z = 1.96. The 95% confidence intervals are:

    For 1p :.60(.40).60 1.96 .60 .029 (.571, .629)

    1132

    For 2p :.23(.77)

    .23 1.96 .23 .025 (.205, .255)1132

    For 3p :.17(.83)

    .17 1.96 .17 .022 (.148, .192)1132

    b. We want to test:

    0 1 2 3: .8, .1, and .1H p p p= = =

    a: At least two proportions are different than specifiedH

    The expected counts in each category are:

    1 1( )E n np= = 1132(.8) = 905.6

    2 2( )E n np= = 1132(.1) = 113.2

    3 3( )E n np= = 1132(.1) = 113.2

    y P(y)

    0 0.3514

    1 0.3914

    2 0.19173 0.0544

    4 0.0099

    5 0.0012

    6 0.0001

    7 0

    8 0

    9 0

    10 0

  • 8/13/2019 Eng ISM Chapter 9 Statistics For Engineering and Sciences by Mandenhall

    18/25

    Categorical Data Analysis 207

    The observed and expected category counts are:

    Appropriate Inappropriate Avoidable

    Observed 679 261 192

    Expected 905.6 113.2 113.2

    The test statistic is:

    ( )2 2 2 2

    2 (679 905.6) (261 113.2) (192 113.2) 304.5

    905.6 113.2 113.2

    i i

    i

    n np

    np

    = = + + =

    The rejection region requires = .10 in the upper tail of the 2 distribution with df = k

    1 = 3 1 = 2. From Table 8, Appendix B, 2.10 = 4.60517. The rejection region is2

    > 4.60517.

    Since the observed value of the test statistic falls in the rejection region ( 2 = 304.5 >

    4.60517) 0H is rejected. There is sufficient evidence to indicate at least two

    proportions are different than specified at = .10.

    9.38 The Statistix printout for the analysis appears below:

    Chi-Square Test for Heterogeneity or Independencefor count = Year abuse

    abuseYear 1 2 3 4

    +-----------+-----------+-----------+-----------+1 Observed | 7 | 5 | 9 | 8 | 29

    Expected | 9.61 | 8.22 | 5.74 | 5.43 |Cell Chi-Sq | 0.71 | 1.26 | 1.85 | 1.22 |

    +-----------+-----------+-----------+-----------+2 Observed | 22 | 18 | 6 | 6 | 52

    Expected | 17.24 | 14.74 | 10.29 | 9.73 |Cell Chi-Sq | 1.31 | 0.72 | 1.79 | 1.43 |

    +-----------+-----------+-----------+-----------+3 Observed | 12 | 15 | 6 | 12 | 45

    Expected | 14.92 | 12.75 | 8.90 | 8.42 |Cell Chi-Sq | 0.57 | 0.40 | 0.95 | 1.52 |

    +-----------+-----------+-----------+-----------+4 Observed | 21 | 15 | 16 | 9 | 61

    Expected | 20.22 | 17.29 | 12.07 | 11.42 |Cell Chi-Sq | 0.03 | 0.30 | 1.28 | 0.51 |

    +-----------+-----------+-----------+-----------+62 53 37 35 187

    Overall Chi-Square 15.86P-Value 0.0699

    Degrees of Freedom 9

    Cases Included 16 Missing Cases 0

    To determine if the proportion of different types of abuse are changing over time, we test:

    0: Types of abuse and year are independentH

    a: Types of abuse and year are dependentH

  • 8/13/2019 Eng ISM Chapter 9 Statistics For Engineering and Sciences by Mandenhall

    19/25

    208 Chapter 9

    The expected category counts are shown in the printout.

    The test statistic is 2 =

    2 ( )

    ( )

    ij ij

    ij

    n E n

    E n

    = 15.86 from printout.

    The rejection region requires = .05 in the upper tail of the 2 distribution with df = (r1)(c

    1) = (4 1)(4 1) = 9. From Table 8, Appendix B, 2.05 =16.9190. The rejection region is

    2 > 16.9190.

    Since the observed value of the test statistic does not fall in the rejection region

    ( 2 0 15.859 16.9190),H= >/ is not rejected. There is insufficient evidence to indicate theproportions of different types of abuse are changing over time at = .05.

    9.40 a. To determine if pesticide depends on orchard type, we test:

    0: Pesticide and orchard type are independentH

    a: Pesticide and orchard type are dependentH

    The test statistic is 2 = 31000.416 (from printout). Thep-value for the test isp= .000.

    At = .01, >p-value, and we reject 0H . There is sufficient evidence to indicate that

    pesticide used and orchard type are dependent.

    PHstat was used to conduct the desired analysis and the following printout was created:

    Observed Frequencies

    Column variable

    Row variable Almonds Peaches Nectarines Total

    Chlor. 41077 4419 11594 57090

    Diazinon 102935 9651 5928 118514

    Methid. 21240 5198 1790 28228

    Parathion 136064 53384 24417 213865

    Total 301316 72652 43729 417697

    Expected Frequencies

    Column variable

    Row variable Almonds Peaches Nectarines Total

    Chlor. 41183.27505 9929.931697 5976.79325 57090Diazinon 85492.98756 20613.69636 12407.31608 118514

    Methid. 20362.96178 4909.82855 2955.209666 28228

    Parathion 154276.7756 37198.54339 22389.681 213865

    Total 301316 72652 43729 417697

  • 8/13/2019 Eng ISM Chapter 9 Statistics For Engineering and Sciences by Mandenhall

    20/25

    Categorical Data Analysis 209

    Data

    Level of Significance 0.01

    Number of Rows 4

    Number of Columns 3

    Degrees of Freedom 6

    Results

    Critical Value 16.8118718

    Chi-Square Test Statistic 31000.41584

    p-Value 0

    b. We will calculate 95% confidence intervals for the rate of parathion application for the

    three orchard types.

    Almonds:136,064

    301,316

    p= = .45

    .025

    .45(.55) .45 1.96 .45 .002

    301,316

    pqp z

    n =

    Nectars:24,417

    43,729

    p= = .56

    .025

    .56(.44) .56 1.96 .56 .005

    43,729

    pqp z

    n =

    Peaches:53,384

    72,652

    p= = .73

    .025

    .73(.27) .73 1.96 .73 .003

    72,652

    pqp z

    n =

    9.42 a. Test 0 1 2: .5H p p= =

    a 1 2:H p p

    The test statistic is:

    ( )2

    2 22

    1 1

    ij ij

    iji j

    O ee= ==

  • 8/13/2019 Eng ISM Chapter 9 Statistics For Engineering and Sciences by Mandenhall

    21/25

  • 8/13/2019 Eng ISM Chapter 9 Statistics For Engineering and Sciences by Mandenhall

    22/25

    Categorical Data Analysis 211

    c. Fishers exact test computes thep-value atp= 0.0173. When testing at = .01,H0cannot be rejected. There is insufficient evidence to detect a difference in proportions

    which agrees with our conclusion above in part a.

    9.44 The Statistix printout for the analysis is shown below:

    Chi-Square Test for Heterogeneity or Independencefor count = Technology Group

    GroupTechnology 1 2 3 4

    +-----------+-----------+-----------+-----------+1 Observed | 21 | 42 | 11 | 25 | 99

    Expected | 24.75 | 24.75 | 24.75 | 24.75 |Cell Chi-Sq | 0.57 | 12.02 | 7.64 | 0.00 |

    +-----------+-----------+-----------+-----------+2 Observed | 18 | 2 | 16 | 13 | 49

    Expected | 12.25 | 12.25 | 12.25 | 12.25 |Cell Chi-Sq | 2.70 | 8.58 | 1.15 | 0.05 |

    +-----------+-----------+-----------+-----------+3 Observed | 11 | 6 | 23 | 12 | 52

    Expected | 13.00 | 13.00 | 13.00 | 13.00 |Cell Chi-Sq | 0.31 | 3.77 | 7.69 | 0.08 |

    +-----------+-----------+-----------+-----------+50 50 50 50 200

    Overall Chi-Square 44.548P-Value 0.0000Degrees of Freedom 6

    Cases Included 12 Missing Cases 0

    a. To determine if public opinion regarding the choice of future technology options for

    generating electricity differ among the four groups, we test:

    0: Choice and group are independentH

    a: Choice and group are dependentH

    The test statistic is2

    = 44.548.

    The rejection region requires = .10 in the upper tail of the2

    distribution with df = (r

    1)(c1) = (3 1)(4 1) = 6. From Table 8, Appendix B,2.10 = 10.6446. The

    rejection region is 2 > 10.6446.

    Since the observed value of the test statistic falls in the rejection region ( 2 = 44.548 >

    10.6446), 0H is rejected. There is sufficient evidence to indicate that public opinion

    does differ among the four groups at = .10.

    b. Let 1 = proportion supporting the coal option and 2p = proportion supporting the

    nuclear option.

  • 8/13/2019 Eng ISM Chapter 9 Statistics For Engineering and Sciences by Mandenhall

    23/25

    212 Chapter 9

    To determine if the proportion supporting the coal option exceeds the proportion

    supporting the nuclear option, we test:

    0 1 2: 0H p p =

    a 1 2: 0H p p >

    1

    99 .495

    200p = = 2

    49 .245

    200p = =

    99 49 .37

    200 200p

    += =

    +

    The rejection region re requires = .10 in the upper tail of thezdistribution. From

    Table 5, Appendix B, .10z = 1.282. The rejection region isz> 1.282.

    The test statistic is:

    1 2 0

    2 2

    ( ) (.495 .245) 0

    (1 ) (1 ) 2 .37(.63) .37(.63) 2(.37)

    200

    p p Dz

    p p p p p

    n

    = =

    + + + += 4.11

    Since the observed value of the test statistic falls in the rejection region (z= 4.11 >

    1.282), 0H is rejected. There is sufficient evidence to indicate the proportion

    supporting coal exceeds the proportion supporting nuclear at = .10.

    c. The form of the confidence interval is:

    / 2

    (1 )

    pp z

    n

    16 .32

    50p= =

    For confidence coefficient .90,

    = 1 .90 = .10 and

    /2 = .10/2 = .05. From Table 5,Appendix B, .05z = 1.645. The 90% confidence interval is:

    .32(1 .32).32 1.645 .32 .109 (.211, .429)

    50

    9.46 The data were tested using Fishers exact test and the results are shown below:

    Two by Two Tables

    +----------+----------+| | || 10 | 6 | 16

    | | |+----------+----------+| | || 12 | 2 | 14| | |+----------+----------+

    22 8 30

    Fisher Exact Tests: Lower Tail 0.1541 Upper Tail 0.0715 Two Tailed 0.2255

  • 8/13/2019 Eng ISM Chapter 9 Statistics For Engineering and Sciences by Mandenhall

    24/25

    Categorical Data Analysis 213

    To determine if the fidelity and selectivity are dependent, we test:

    0:H Fidelity and Selectivity are independent

    a:H Fidelity and Selectivity are dependent

    Thep-value for the test is 0.2255.

    When testing at = .05, 0H cannot be rejected. There is insufficient evidence to indicate

    that fidelity and selectivity are dependent when testing at = .05.

    9.48 Some preliminary calculations are:

    1 2 3 4 5 6 7 8 714(.125) 89.25ie e e e e e e e np= = = = = = = = = =

    a. To determine if the probabilities of worker accidents are higher for some time periods,

    we test:

    0 1 2 3 4 5 6 7 8: .125H p p p p p p p p= = = = = = = =

    a : At least two of the cell probabilities differ from each otherH

    The test statistic is:

    ( )2

    2

    i i

    ii

    O e

    e

    =

    =2 2 2 2(93 89.25) (71 89.25) (79 89.25) (110 89.25)

    15.90589.25 89.25 89.25 89.25

    + + + + =

    The rejection region requires = .10 in the upper tail of the2

    distribution with df = k

    1 = 8 1 = 7. From Table 8, Appendix B,2.10 =12.0170. The rejection region is

    2 > 12.0170.

    Since the observed value of the test statistic falls in the rejection region ( 2 = 15.905 >

    12.017, 0H is rejected. There is sufficient evidence to indicate the probabilities of

    worker accidents are higher in some time periods at = .10.

    b. 1 98 89 102 110 399 .5588714 714

    p + + += = =

    0 1: .5H p =

    a 1: .5H p >

  • 8/13/2019 Eng ISM Chapter 9 Statistics For Engineering and Sciences by Mandenhall

    25/25

    The test statistic is 1 10

    10 10

    .5588 .53.14

    ( ) .5(.5)

    714

    p pz

    p q

    n

    = = =

    The rejection region requires = .10 in the upper tail of thezdistribution. From Table

    5, Appendix B, .10 1.28z = . The rejection region isz> 1.28.

    Since the observed value of the test statistic falls in the rejection region (z= 3.14 >

    1.28), 0H is rejected. There is sufficient evidence to indicate the probability of an

    accident during the last 4 hours of a shift is greater than during the first 4 hours at

    = .10.