7
1 BES Tutorial Sample Solutions, S2 2010 This document will be posted on the BES website with one wee’s delay. WEEK 10 TUTORIAL EXERCISES (To be discussed in the week starting September 27) 1. State whether the normal distribution, t distribution or neither would be used to test hypotheses regarding the population mean in the following situations: (a) Population normally distributed, σ 2 unknown, sample size less than 30. tdistribution (b) Population normally distributed, σ 2 unknown, sample size greater than 30. tdistribution although as the sample size gets very large this effectively becomes the same as using the normal. (c) Population normally distributed, σ 2 known, sample size less than 30. Normal distribution (d) Population not normally distributed, σ 2 unknown, sample size greater than 30. Because the sample size is large you can invoke the CLT and use the fact that s 2 is a consistent estimator of σ 2 to justify using the normal distribution. (e) Population not normally distributed, σ 2 unknown, sample size less than 30. Here the sampling distribution is unknown and hence we don’t know how to test a hypothesis about in this circumstance. In practice you could either assume the population is approximately normally distributed and proceed as in (a); or alternatively invoke the CLT and proceed as in (d). How well either of these solutions works ultimately depends on the (unknown) extent of nonnormality of the population distribution.

Students Tutorial Answers Week10

Embed Size (px)

DESCRIPTION

Students Tutorial Answers Week10

Citation preview

  • 1

    BES Tutorial Sample Solutions, S2 2010 This document will be posted on the BES website with one wees delay.

    WEEK 10 TUTORIAL EXERCISES (To be discussed in the week starting

    September 27) 1. State whether the normal distribution, t distribution or neither would be

    used to test hypotheses regarding the population mean in the following situations: (a) Population normally distributed, 2 unknown, sample size less than

    30. tdistribution

    (b) Population normally distributed, 2 unknown, sample size greater than 30.

    tdistributionalthoughasthesamplesizegetsverylargethiseffectivelybecomesthesameasusingthenormal.

    (c) Population normally distributed, 2 known, sample size less than 30.

    Normaldistribution

    (d) Population not normally distributed, 2 unknown, sample size greater than 30.

    BecausethesamplesizeislargeyoucaninvoketheCLTandusethefactthat s2 is a consistent estimator of 2 to justify using the normaldistribution. (e) Population not normally distributed, 2 unknown, sample size less

    than 30. Herethesamplingdistributionisunknownandhencewedontknowhowtotestahypothesisaboutinthiscircumstance.Inpracticeyoucouldeitherassume thepopulation isapproximatelynormallydistributedandproceedas in (a);oralternatively invoke theCLTandproceedas in (d).Howwelleitherofthesesolutionsworksultimatelydependsonthe(unknown)extentofnonnormalityofthepopulationdistribution.

  • 2

    2. Reconsider Question 2 of the Week 9 exercises. In that exercise, a real estate expert claimed the current mean value of houses in a particular area was more than $250,000. A random sample of 150 recent sales prices in the area yielded a sample mean of $265,000 and it is known that house values in the area are approximately normally distributed with a standard deviation of $50,000. (a) If in fact the population mean house value in the area is $260,000,

    what is the probability of committing a type II error in performing an upper tail test of the null hypothesis that the mean house value price in the area is $250,000, as in Question 1 part (a) of the Week 9 exercises? What is the power of the test in these circumstances? State in words what the power of the test means.

    Let valueofahouseinthearea $265,000, $50,000, 150, ~, : 250,000;: 250,000Rejectionregion:

    . 250,000 1.645 50,000150 256,715.68

    ThusTypeIIerror(ProbabilityofnotrejectingH0whenitisfalse):

    256,715.68| 260,000 256,715.68 260,00050,000 150 0.8 0.2119

    1 0.7881The power of the test gives the probability of correctly rejecting the nullhypothesiswhenitisfalse.

    X

  • 3

    (b) Illustrate your answer to part (a) above by showing on a diagram the areas representing the probability of a type II error and the power of the test.

    Under: 250,000

    1powerunder260,000

    250,000 260,000 $256,715.68

    3. A company running an urban rail service wishes to estimate its daily average number of late running trains on week days. For 10 randomly selected week days, it finds the following numbers of late running trains:

    32, 10, 9, 18, 25, 15, 14, 18, 22, 16

    (a) Assuming the number of late running trains on a weekday is

    approximately normally distributed, calculate a 90% confidence interval for the mean number of late running trains on a week day.

    Let X numberoflatetrainsonaweekday

    0.1, 17.9, 48.32, 6.9514 Since2 isunknown,n issmallandtheunderlyingdistribution isnormal,weconstructtheconfidenceintervalusingthetdistribution.

  • 4

    Requiredintervalis

    ,

    17.9 .,

    6.951410

    17.9 1.833 6.951410 17.9 4.029 13.871,21.929

    (b) If we did not have the assumption of normality, could we still

    calculate a confidence interval in this example? If not, suggest a way of overcoming this problem.

    Everythingelsethesame,wecouldnotconstructaconfidence interval inthesame way as in (a) since the t distribution is only valid if the underlyingdistributionisnormal.Thisproblemcouldbeovercomebyobtainingalargersamplesizeandthenmakinguseofthecentrallimittheorem(andreplacing bys).

  • 5

    4. Reconsider Question 5 of the Week 8 exercises. Would normality be a good approximation for the population distribution of distance traveled by used passenger cars? (Hint: look at the summary statistics and a histogram.) Do you need to assume normality? Redo the 95% confidence interval for the population mean distance traveled by used passenger cars without assuming a known population standard deviation.

    EXCEL summary statisticsandhistogram fordistance traveled indicatenonnormality.Thedistributionisskewedtotheright,themedianismuchlessthanthemean,andthesamplemeanisonly1.35standarddeviationsfromzero:

    Odometer (km)

    Mean 78560.83Standard Error 5384.86Median 67980Mode 147000Standard Deviation 58246.19Sample Variance 3392618896Kurtosis 3.426Skewness 1.528Range 315597Minimum 403Maximum 316000Sum 9191617Count 117

    Frequency histogram for odometer readings for cars in Anzac Garage data

    0

    5

    10

    15

    20

    25

    30

    35

    40

    45

    20000 60000 100000 140000 180000 220000 260000 300000

    Odometer (kms)

    Freq

    uenc

    y

  • 6

    Whilethepopulationdistributionseemsnonnormal,thesamplesize is largeenough to invoke the CLT and hence to assume the sample mean isapproximatelynormallydistributed.InQuestion5of theWeek8weassumedknownbuthereweconsider themorelikelysituationwhereitisunknownandwereplacebysascalculatedbyEXCEL.The95%confidenceintervalisgivenby

    / 78,561 1.9658,246117 78561 10,554

    68,007,89,115

    5. It is known that 80% of people suffering from a particular disease are cured

    by a certain medication. Calculate the probability that out of a random sample of 400 people with the disease, less than 330 will be cured by using the medication. (Hint: Use the normal approximation and ignore continuity correction).

    0.8, 400& 330400 0.825 0.825 Thereforewecanusethenormalapproximationtothebinomial,i.e.

    ~ , 1 ~ 0.8, 0.8 0.2400

    So,ignoringthecontinuitycorrection: 0.825 0.825 0.80.8 0.2/400 1.25 0.8944

    (We could of coursealsowork in terms of thebinomial random variableX,calculating 330)

  • 7

    6. A unisex hairdressing salon is interested in determining the proportion of its clients who are male (p), as this will influence its advertising strategy. A random sample of 100 of the salons clients is taken and leads to the calculation of a confidence interval for p of (0.6102, 0.7898).

    (a) What is the value of the sample proportion on which the reported

    confidence interval is based? Sincetheconfidenceintervalforthepopulationproportionisalwayscenteredaroundthepointestimate,theisalwaysthemiddlepoint,i.e.

    0.6102 0.78982 0.7 (b) What level of confidence was used in the calculation of the reported

    confidence interval? Assuming

    ~ , 1 thenwehave(replacingpby):

    0.6102,0.7898 /1 0.7 /0.7 0.3100

    Thus0.0898 /.. / 0.0458and

    / 0.08980.0458 1.96implying/2=0.025&hence=0.05or5%.