Mms Testing of Hypothesis

Embed Size (px)

Citation preview

  • 8/6/2019 Mms Testing of Hypothesis

    1/69

    Statistical Estimation

    1. Point and interval estimation

    2. Confidence interval for mean,

    proportion & Variance

  • 8/6/2019 Mms Testing of Hypothesis

    2/69

    8/8/2011 Lecture24 2

    Introduction

    Everyone makes estimates !

    When you ready to cross a street, youestimate the speed of any car that

    approaching, the distance between you andthat car, and your own speed. Having madethese quick estimates, you decide whether towait, walk, or run.

    All managers must make quick estimatestoo.. The outcome of these estimates canaffect their organizations seriously too.

  • 8/6/2019 Mms Testing of Hypothesis

    3/69

    8/8/2011 Lecture24 3

    Introduction

    How do mangers use sample statistics toestimate population parameters?

    StatisticalE

    stimation methods enable us toestimate with reasonable accuracy thepopulation proportion

    If all these estimates are obtained on a

    census Basis, it would be very costly andtime-consuming proposition. Hence samplingtheory

  • 8/6/2019 Mms Testing of Hypothesis

    4/69

    8/8/2011 Lecture24 4

    Statistical Estimation

    Statistical estimation is the procedure of using a sample statistic to estimate apopulation parameter. A statistic used toestimate a parameter is called an estimatorand the value taken by the estimator is calledan estimate.

    Statistical estimation is divided into two maincategories: Point estimation & Intervalestimation

  • 8/6/2019 Mms Testing of Hypothesis

    5/69

    8/8/2011 Lecture24 5

    Point estimation

    A point estimate is a single number that isused to estimate an unknown populationparameter.

    Example If a firm takes a sample of 50 salesman

    And Average amount of time each salesmanspend with his customers is 80 minutes

    And This figure is used for an estimate of aparameter

    Then 80 is the point estimate.

  • 8/6/2019 Mms Testing of Hypothesis

    6/69

    8/8/2011 Lecture24 6

    Interval estimation

    An estimate of a population parameter givenby two numbers between which theparameter may be considered to lie is calledas interval estimate of the parameter

    An interval estimation is a range of valuesused to estimate a population parameter.

    Average amount of time each salesman spendwith his customers is between 60 to 80 minutes

  • 8/6/2019 Mms Testing of Hypothesis

    7/69

    8/8/2011 Lecture24 7

    Criteria of a Good Estimator.

    A good estimate is one which is close to thepopulation parameter being estimated.

    We can evaluate the quality of a statistic asan estimator by using four criteria.

    Unbiasedness

    Consistency

    Efficiency

    Sufficiency

  • 8/6/2019 Mms Testing of Hypothesis

    8/69

    8/8/2011 Lecture24 8

    Unbiasedness

    An estimator is said to be unbiased if theexpected value of the estimator is equal tothe population parameter being estimated.

    OR

    The mean of the sample values is equal tothe population parameter, then it is unbiased

    estimate.

  • 8/6/2019 Mms Testing of Hypothesis

    9/69

    8/8/2011 Lecture24 9

    Consistency

    As the sample size increases, the differencebetween the sample statistic and thepopulation parameter should become smallerand smaller. If the difference continues tobecome smaller and smaller as the samplesize becomes larger, the sample statistic is

    said to converge in probability to a parameterand is said to be consistent estimator of thatparameter.

  • 8/6/2019 Mms Testing of Hypothesis

    10/69

    8/8/2011 Lecture24 10

    Efficiency

    The efficiency of an estimator depends on itsvariance. If the variance of the estimator issmall, then that estimate is closer to the

    parameter value. For example sample mean and sample

    median are unbiased and consistentestimators of population mean. Choose

    between them on the basis of relativeefficiency. (select one which have smallervariance)

  • 8/6/2019 Mms Testing of Hypothesis

    11/69

    8/8/2011 Lecture24 11

    Sufficiency

    A sufficient estimator is one that uses allinformation about the population parametercontained in the sample.

    Example sample mean is a sufficientestimator of the population mean since all theinformation in the sample is used in its

    computation. Not sample range.

  • 8/6/2019 Mms Testing of Hypothesis

    12/69

    8/8/2011 Lecture24 12

    Example

    Consider

    A medical supplies company that producesdisposable hypodermic syringes . Each syringe is

    wrapped in a sterile package and then jumble-packed in a large corrugated carton. Jumble packingcauses the cartons to contain differing number ofsyringes. Because the syringes are sold on a per

    unit basis, the company needs an estimate of thenumber of syringes per carton for billing purposes.

  • 8/6/2019 Mms Testing of Hypothesis

    13/69

    8/8/2011 Lecture24 13

    Cont..

    A sample of 35 cartons is taken and thenumber of syringes in each carton isrecorded .

    Obtain the sample mean

    Sample mean = 102 syringes

    Then we can say that the point estimate of

    the population mean is 102 syringes per carton

  • 8/6/2019 Mms Testing of Hypothesis

    14/69

    8/8/2011 Lecture24 14

    Cont..

    The manufactured price of a disposablehypodermic syringe is quite small, so both thebuyer and seller would accept the use of this

    point estimate 102 as the basis for billing

    Manufacturer can save the time and expenseof counting each syringe that goes into a

    carton.

  • 8/6/2019 Mms Testing of Hypothesis

    15/69

    8/8/2011 Lecture24 15

    Interval estimates

    An interval estimate describes a range of values within which a population parameteris likely to lie.

  • 8/6/2019 Mms Testing of Hypothesis

    16/69

    8/8/2011 Lecture24 16

    Example

    Suppose the marketing research directorneeds an estimate of the average life inmonths of car batteries his company

    manufactures. Select a sample of 200 carowners. Interview these owners and collectthe data about the life of batteries.

    Let mean life = 36 months If Point estimate then 36 months.

  • 8/6/2019 Mms Testing of Hypothesis

    17/69

    8/8/2011 Lecture24 17

    Cont..

    If the director asks for a statement about theuncertainty that will be likely to accompany thisestimate or a range

    That can be done by

    Calculating the standard error of the mean as

    Say 0.707

    We could now report that our estimate of the lifeof the companys batteries may lie somewhere inthe range of 35.293 to 36.707 months.

    nx

    W

    W !

  • 8/6/2019 Mms Testing of Hypothesis

    18/69

    8/8/2011 Lecture24 18

    ConfidenceInterval

    The probability that we associate with aninterval estimate is called the confidencelevel.

    How confident?

    Most commonly used confidence levels are90%, 95% & 99%

    Free to apply any confidence level.

  • 8/6/2019 Mms Testing of Hypothesis

    19/69

  • 8/6/2019 Mms Testing of Hypothesis

    20/69

    8/8/2011 Lecture24 20

    Interval estimation-Students t distribution

    When ever sample size is 30 or less and thepopulation standard deviation is not known.Then use t distribution.

    In using t distribution we assume that thepopulation is normally distributed.

    A t distribution is lower at the mean andhigher at the tails than a normal distribution.

    There is separate t distribution for eachsample size Or for different degrees offreedom.

  • 8/6/2019 Mms Testing of Hypothesis

    21/69

    8/8/2011 Lecture24 21

    Degrees of freedom.

    The number of values we can choose freely.

  • 8/6/2019 Mms Testing of Hypothesis

    22/69

    8/8/2011 Lecture24 22

    Example

    As part of the budgeting process for next year, themanager of the Fan point electric generating plantmust estimate the coal he will need for this year.

    Last year the plant almost ran out, so he is reluctantto budget for that same amount again. The plantmanager took a random sample of 10 plantoperating weeks chosen over the last 5 years. Ityielded a mean usage of 11400 tons a week, asample standard deviation of 700 tons a week.Calculated a sensible estimate of the amount( with95 % confident ) to order this year.

  • 8/6/2019 Mms Testing of Hypothesis

    23/69

    8/8/2011 Lecture24 23

    n=10 df=9

    Sample mean=11400

    S.d=700 ( approximate this as population S.D.)

    Standard error=

    =221.38

    From t-table, corresponding to d.f 9 & confidencelevel (1.00-0.95)=0.05 the t value= 2.262

    The confidence interval is 11400 + 2.262* 221.38

    10899 to 11901 tons with 95 % confidence

    nx

    W

    W !

  • 8/6/2019 Mms Testing of Hypothesis

    24/69

    8/8/2011 Lecture24 24

    Tests of Hypothesis

  • 8/6/2019 Mms Testing of Hypothesis

    25/69

    8/8/2011 Lecture24 25

    Suppose a manger of a large shopping mall tells us thatthe average work efficiency of her employees is at least90%. How can we test the validity of her claim?

    We could calculate the efficiency of a sample of her employees.

    If this sample statistic came out be 95% we would accept

    the managers statement. But if it is 46% we would reject her assumption as

    untrue.

    Suppose sample statistic is 88%. Whether we accept orreject?

    We cannot be absolutely certain that our decision iscorrect.

    Therefore learn to deal with uncertainty in our decisionmaking.

  • 8/6/2019 Mms Testing of Hypothesis

    26/69

    8/8/2011 Lecture24 26

    Hypothesis

    Here we wish to test efficiency = 90% (null)

    Against the alternative, efficiency 90%,(alternate)

    Or we can say

    null hypothesis H0 0=90

    alternate hypothesis H1 1 90

  • 8/6/2019 Mms Testing of Hypothesis

    27/69

  • 8/6/2019 Mms Testing of Hypothesis

    28/69

    8/8/2011 Lecture24 28

    Level of significance

    If we assume the hypothesis is correct, thenthe significance level will indicate the % ofsample means that is outside certain limits.

  • 8/6/2019 Mms Testing of Hypothesis

    29/69

    8/8/2011 Lecture24 29

    Cont..

    The purpose of testing is not to question thecomputed value of the sample statistic but tomake a judgment about the difference

    between that sample statistic and testedpopulation parameter.

  • 8/6/2019 Mms Testing of Hypothesis

    30/69

    8/8/2011 Lecture24 30

    Introduction

    A hypothesis is an assumption about the populationparameter to be tested based on sampleinformation.

    Hypothesis tests are widely used in business andindustry for making decisions..

    Examples

    Based on sample data decide whether a new

    medicine is really effective in curing a disease Whether one training procedure is better than other.

  • 8/6/2019 Mms Testing of Hypothesis

    31/69

    8/8/2011 Lecture24 31

    The hypothesis is made about the value ofsome parameter, (only facts available toestimate the true parameter are thoseprovided by sample)

    If the sample statistic differs from thehypothesis made about the populationparameter, and if it is significant, then rejectthe hypothesis.

    If it is not significant then it must beaccepted. Hence tests of hypothesis

  • 8/6/2019 Mms Testing of Hypothesis

    32/69

    8/8/2011 Lecture24 32

    Procedures of HypothesisTesting

    Set up a hypothesis

    Set up a suitable significance level

    Determination of a suitable test statistic Determination of the critical region

    Doing computations

    Making decisions

  • 8/6/2019 Mms Testing of Hypothesis

    33/69

    8/8/2011 Lecture24 33

    Set up a hypothesis

    Establish the hypothesis to be tested.

    Set up

    Null hypothesis denoted by H0 & Alternate hypothesis denoted by H1 The null hypothesis

    There is no true difference in the samplestatistic and population parameter underconsideration

  • 8/6/2019 Mms Testing of Hypothesis

    34/69

    8/8/2011 Lecture24 34

    Set up a hypothesis

    The hypothesis that is different from the nullhypothesis is the alternate hypothesis H1

    If the sample information leads to reject H0

    ,then accept H1

  • 8/6/2019 Mms Testing of Hypothesis

    35/69

    8/8/2011 Lecture24 35

    Set up a suitable significance level

    The confidence with which an experimenter rejects orretains null hypothesis

    The level of significance is denoted by

    It is generally specified before any sample is drawn.

    (no influence)

    In practice 5% or 1% level of significance

    5% 5 chances out of 100 that we would reject thenull hypothesis ( 95% confident that right decision )

    E

  • 8/6/2019 Mms Testing of Hypothesis

    36/69

    8/8/2011 Lecture24 36

    EWhen the null hypothesis is rejected at=0.5 the result is said to be significant.

    When the null hypothesis is rejected at =0.01 the result is said to be significant. Thetest result is said to be highly significant

    E

  • 8/6/2019 Mms Testing of Hypothesis

    37/69

  • 8/6/2019 Mms Testing of Hypothesis

    38/69

    8/8/2011 Lecture24 38

    Determination the critical region

    Determination of

    Which value of test statistic will lead to arejection of H0

    And which lead to acceptance of H0. The former is called critical region.

    Establishing a critical region is similar todetermining a 100 (1- ) % confidence interval.E

  • 8/6/2019 Mms Testing of Hypothesis

    39/69

    8/8/2011 Lecture24 39

    Doing computations

    Calculations for step 3

  • 8/6/2019 Mms Testing of Hypothesis

    40/69

    8/8/2011 Lecture24 40

    Making decisions

    Draw statistical conclusions

    Either acceptance of the null hypothesis orrejection of it.

    Based on whether the computed value of thetest statistic falls in the region of acceptanceor region of rejection

  • 8/6/2019 Mms Testing of Hypothesis

    41/69

    8/8/2011 Lecture24 41

  • 8/6/2019 Mms Testing of Hypothesis

    42/69

    8/8/2011 Lecture24 42

    Point estimation. Appropriate when the goal is to estimate a population

    parameter.

    Confidence interval.

    Appropriate when the goal is to estimate a populationparameter with confidence.

    Hypotheses testing. Hypothesis: a statement about the parameters.

    Appropriate when the goal is to assess if the evidenceprovided by the data is in favor of some claim about thepopulation.

    Procedures for statistical inferences

  • 8/6/2019 Mms Testing of Hypothesis

    43/69

    8/8/2011 Lecture24 43

    ConfidenceInterval

    Point estimate +/- margin of error

    Confidence interval for a population mean

    Assumption: the population variance is known.

    Confidence level: C

    n

    zx

    n

    zx

    WW

    ** ,

  • 8/6/2019 Mms Testing of Hypothesis

    44/69

    8/8/2011 Lecture24 44

    HypothesisTesting

    Sometimes, not interested in

    Estimate an unknown parameter

    Provide a confidence interval for the parameter

    But rather, you have some claim (belief)about the parameter and you want to see

    whether the data supports the claim or not. Support

    Contradict

  • 8/6/2019 Mms Testing of Hypothesis

    45/69

    8/8/2011 Lecture24 45

    The critical concepts of hypothesis testing:two hypotheses

    H0 - the null hypothesis

    The statement of no effect or nodifference.

    Ha - the alternative hypothesis

    The statement we hope or suspect is true.

    Usually one would decide on Ha first.

    Concepts of HypothesisTesting

  • 8/6/2019 Mms Testing of Hypothesis

    46/69

    8/8/2011 Lecture24 46

    Biased one-Euro Coin?A group of Statistics students spin theBelgian one-Euro coin 250 times, and it

    came up heads 140 times.

    p: the probability of getting a head duringeach spin.

    H0: p = .5 against Ha: p > .5.

    One-sided H0: p = .5 against Ha: p .5.

    Two-sided

  • 8/6/2019 Mms Testing of Hypothesis

    47/69

    8/8/2011 Lecture24 47

    A new billing system for a company will be cost- effective only if themean monthly account is more than $170.

    A sample of 400 monthly accounts has a mean of $178.

    If the accounts are normally distributed with W = $65, can we concludethat the new system will be cost effective?

    The population is the credit accounts at the store.

    We want to show that the mean account for all customers is greater than$170. Ha: Q > 170.

    The null hypothesis must specify a single value of the parameterQH0 :Q = 170.

    How can we achieve that?

    CompanyBilling System

  • 8/6/2019 Mms Testing of Hypothesis

    48/69

    8/8/2011 Lecture24 48

    Test statistic

    A test is based on a statistic, which estimatesthe parameter that appears in the hypotheses

    Point estimate

    Values of the estimate far from the parametervalue in H0 give evidence against H0.

    Ha determines which direction will be counted

    as far from the parameter value.

  • 8/6/2019 Mms Testing of Hypothesis

    49/69

    8/8/2011 Lecture24 49

    CompanyBilling SystemQuestion:

    Is a sample mean of 178 sufficiently greaterthan 170 to infer that the population mean isgreater than 170?

    Answer:

    Lets assume the population mean is 170,and see how likely it is for us to observe a

    sample mean of 178 or even more.

  • 8/6/2019 Mms Testing of Hypothesis

    50/69

    8/8/2011 Lecture24 50

    P-value:

    the probability of observing a test statistic as extreme ormore extreme than the actually observed value, giventhat H0 is true.

    extreme means far from what we would expect fromH0 .

    The P-value provides information about theamount of statistical evidence that supports the

    null hypothesis. The smaller the P-value, the less the evidence forH0.

    P-value

  • 8/6/2019 Mms Testing of Hypothesis

    51/69

    8/8/2011 Lecture24 51

    Because the probability that the sample mean is equal orlarger than 178, when Q = 170, is so small (.0069), thereare no reasons to believe that Q = 170.

    (or, reasons to believe that Q> 170.)

    We can conclude that the smallerthe P-value

    the more statistical evidenceexists to

    suppor

    t the

    alter

    native

    hypo

    thesis.

    InterpretingP-value

  • 8/6/2019 Mms Testing of Hypothesis

    52/69

    8/8/2011 Lecture24 52

    If the P-value is less than 1%, there is overwhelmingevidence that supports the alternative hypothesis.

    If the P-value is between 1% and 5%, there is strong

    evidence that supports the alternative hypothesis.

    If the P-value is between 5% and 10% there is weakevidence that supports the alternative hypothesis.

    If the P-value exceeds 10%, there is no evidence thatsupports of the alternative hypothesis.

    DescribingP-value

  • 8/6/2019 Mms Testing of Hypothesis

    53/69

    8/8/2011 Lecture24 53

    SignificanceLevel E We need to make a conclusion after carrying out the

    hypothesis test. What do we conclude?

    We can compare the P-value with a fixed value that weregard as decisive.

    This amounts to announcing in advance how muchevidence against H0we require in order to reject H0.

    The decisive value is called the significance levelof thetest. It is denoted by E and the corresponding test is

    called a levelE

    test.

    Statistical Significance: If the P-value e E, we saythat the data are statistically significant at level E.

  • 8/6/2019 Mms Testing of Hypothesis

    54/69

    8/8/2011 Lecture24 54

    E and P-value

    P-value and significance level E: Reject H0 if

    Do not reject H0 if

    When is it easier to reject H0?

    Large E or smallE ?

    .

    When is the evidence against H0 stronger?

    Large P-value or smallP-value?

    .

  • 8/6/2019 Mms Testing of Hypothesis

    55/69

    8/8/2011 Lecture24 55

    Four steps of hypotheses testing

    Define the hypotheses to test, and the requiredsignificance level E

    Calculate the value of the test statistic.

    Find the P-value based on the observed data. State the conclusion.

    Reject the null hypothesis if the P-value E, thedata do not provide sufficient evidence to reject the null.

  • 8/6/2019 Mms Testing of Hypothesis

    56/69

    8/8/2011 Lecture24 56

    Testing for normal mean with

    known W Let X1, ,Xn be a random sample from N(Q,W2). Null hypothesis:

    H0: Q =Q0

    Alternative hypothesis: Ha: Q { Q0 Ha: Q >Q0 Ha: Q

  • 8/6/2019 Mms Testing of Hypothesis

    57/69

    8/8/2011 Lecture24 57

    Normal with known W: Z test

    When H0 is true, and

    has a standard normal distribution. Z is a natural measure of the distance between

    the sample mean and its expected value Q.

    For a given sample, we observe

    IfH0 is true, we expect zto be close to 0.

    n

    XZ

    /

    0

    W

    Q!

    X

    ./

    0

    nxzW

    Q!

    0QQ !X

  • 8/6/2019 Mms Testing of Hypothesis

    58/69

    8/8/2011 Lecture24 58

    Normal with known W

    Case 1: Ha: Q {Q0. H0 should be rejected if z is too far away from 0.

    The P-value is

    Case 2: Ha: Q >Q0. H0 should be rejected if z is much larger than 0.

    The P-value is

    Case 3: Ha: Q

  • 8/6/2019 Mms Testing of Hypothesis

    59/69

    8/8/2011 Lecture24 59

    Normal with known W:P-value

    method

    Null hypothesis: H0: Q=

    Q0 Test statistic:

    Alternative hypothesis P-value

    Ha: Q { Q0Ha: Q > Q0Ha: Q < Q0

    .n

    xz

    W

    Q!

  • 8/6/2019 Mms Testing of Hypothesis

    60/69

    8/8/2011 Lecture24 60

    Sprinkler A sprinkler systems maker claims that the true average

    system-activation temperature is 130o. A sample ofn = 9systems , when tested, yields a sample averageactivation temperature of 131.08o. If the distribution ofactivation temperature is normal with W= 1.5o, does the

    data contradict the claim at significance level E = .01 ? Let Q = true average activation temperature.

    Hypotheses:

    Test statistic:

    P-value:

    Conclusion:

  • 8/6/2019 Mms Testing of Hypothesis

    61/69

    8/8/2011 Lecture24 61

    The rejection region is a range of values such that if thetest statistic falls into that range, the null hypothesis isrejected.

    The rejection region method: Define the hypotheses to test, and the required significance level

    E

    Find the corresponding rejection region.

    Calculate the test statistic.

    Reject the null hypothesis only if the value of the test statistic fallsin the rejection region.

    Rejection Region Method

  • 8/6/2019 Mms Testing of Hypothesis

    62/69

    8/8/2011 Lecture24 62

    Normal with known W:Rejection

    RegionMethod Null hypothesis: H0: Q =Q0 Test statistic:

    Alternative Rejection regionhypothesis for level E test

    Ha: Q { Q0

    Ha: Q >Q0Ha: Q

  • 8/6/2019 Mms Testing of Hypothesis

    63/69

    8/8/2011 Lecture24 63

    Sprinkler A sprinkler systems maker claims that the true average

    system-activation temperature is 130o. A sample ofn = 9systems , when tested, yields a sample average activationtemperature of 131.08o. If the distribution of activationtemperature is normal with W= 1.5o, does the data

    contradict the claim at significance level E = .01 ? Let Q =true average activation temperature.

    1 Hypotheses:

    2 Rejection region:

    3 Test statistic:

    4 Conclusion:

  • 8/6/2019 Mms Testing of Hypothesis

    64/69

    8/8/2011 Lecture24 64

    Sprinkler Revisited

    A sprinkler systems maker claims that thetrue average system-activation temperatureis 130o. A sample ofn = 9 systems , when

    tested, yields a sample average activationtemperature of 131.08o. If the distribution ofactivation temperature is normal with W=1.5o,

    does the data contradict the claim at significancelevel E = .01 ?

    whats the 99% confidence interval for theactivation temperature?

  • 8/6/2019 Mms Testing of Hypothesis

    65/69

    8/8/2011 Lecture24 65

    CI & 2-Sided Tests

    A level E 2-sided test rejects H0: Q =Q0 exactlywhen the value Q0 falls outside a level 1 - Econfidence interval forQ.

    Confidence interval can be used to testhypotheses.

    Calculate the 1 - E level confidence interval, then

    ifQ0 falls within the interval, do not reject the null

    hypothesis, Otherwise, reject the null hypothesis.

  • 8/6/2019 Mms Testing of Hypothesis

    66/69

    8/8/2011 Lecture24 66

    In a discussion of SAT scores, someone comments: Because only a

    minority of students take the test, the scores overestimate the ability oftypical seniors. The mean SAT-M score is about 475, but I think if allseniors took the test, the mean would be 450.

    You gave the test to an SRS of 500 seniors from California. Thesestudents had an average score of 461. (The SAT-M score follows a

    normal distribution with a standard deviation of 100.)

    Is there sufficient evidence against the claim that the mean for allCalifornia seniors is 450 under a significance level of 0.05?

    Give a 95% CI for the mean score Q of all seniors.

    SAT

  • 8/6/2019 Mms Testing of Hypothesis

    67/69

    8/8/2011 Lecture24 67

    A 95% confidence interval forQ is

    SAT

    Because Ha is two-sided, the P-value is.

    Conclusion:

    The hypotheses are

    The test statistic is

  • 8/6/2019 Mms Testing of Hypothesis

    68/69

    8/8/2011 Lecture24 68

    Take HomeMessage

    Tests of significance: When to use it Two hypotheses:

    Null

    Alternative Test for a population mean with known W

    Test statistic P-value Significance level E

    P-value method 4 steps

    Rejection region method

    CI and 2-sided test

  • 8/6/2019 Mms Testing of Hypothesis

    69/69

    Homework12.1

    Reading in Text 435-452

    Exercises in Text

    6.32, 6.36, 6.44, 6.48, 6.52, 6.56

    Due Time

    Thursday, April 28