Statistics 2[1]- Notes

Embed Size (px)

Citation preview

  • 8/7/2019 Statistics 2[1]- Notes

    1/29

    1

    Statistics 2

    Binomial Distribution

    A spinner is divided into four equal sized sections marked 1, 2, 3, 4. If the spinner is spun 6 times, how

    likely is it to land on 1 on four occasions?

    One possible sequence would be 111111.

    The number of possible sequences is or .

    Each sequence has probability 0.254 0.752.

    So the required probability is

    A binomial distribution arises when the following conditions are met:

    an experiment is repeated a fixed number (n) of times(i.e., there is a fixed number of trials);

    the outcomes from the trials are independent of one another; each trial has two possible outcomes (referred to as success and failure); the probability of a success (p) is constant.

    If the above conditions are satisfied andXis the random variable for the number of successes, thenX

    has a binomial distribution. We write:

    Where n = number of trials and p = probability of success.

    Interpretation of certain phrases is critical, especially when dealing with Discrete distributions. (Binomial

    & Poisson).

    Phrase Means To use tablesGreater than 5 X > 5 1 P(X 5)

    At least 7 X 7 1 P(X 6)

    Fewer than 10 X < 10 P(X 9)

    No more than 3 X 3 P(X 3)

    At most 8 X 8 P(X 8)

    Exactly 4 X = 4 P(X 4) P(X 3)

  • 8/7/2019 Statistics 2[1]- Notes

    2/29

    2

    Example :

    a) P(X= 3)

    Using tables

    b) P(X> 1)

    Example:

    The probability that a baby is born a boy is 0.51. A mid-wife delivers 10 babies. Find:

    a) The probability that exactly 4 are male;

    b) The probability that at least 8 are male.

    ( ( (

  • 8/7/2019 Statistics 2[1]- Notes

    3/29

    3

    It can be shown that ifX~ B(n, p), then

    Poisson Distribution

    A random variableXwhich counts the number of times an event occurs in a given unit of space or time

    will have a Poisson distribution if:

    The events occur independently of each other and at random; The events occur at a constant rate; The events occur singly (one at a time).

    The notation used to indicate that a random variable X has a Poisson distribution is

    The distribution is fully specified by a single parameter .

    IfX~ Po( ) then

    For

    Example:

    SupposeX~ Po( ). Find .

  • 8/7/2019 Statistics 2[1]- Notes

    4/29

    4

    Example:

    On average a call centre receives 1.75 phone calls per minute.

    a) Assuming a Poisson distribution, find the probability that the number of phone calls receivedin a randomly chosen minute is :

    (i) Exactly 4;(ii) No more than 2.

    LetX= number of phone calls received in 1 minute.

    Then X~ Po(1.75).

    b) Find the probability that 6 phone calls are received in a 4 minute period.Let Y= number of phone calls received in 4 minutes.

    The number of calls in 4 minutes will be on average

    So Y~ Po(7).

  • 8/7/2019 Statistics 2[1]- Notes

    5/29

    5

    Approximating a Binomial by a Poisson

    X~ B(n, p), then X can be reasonably be approximated by a Poisson distribution with mean np if :

    n is large p is small

    Two frequently used rules of thumb are :

    n > 50 and np < 5 n > 50 and p < 0.1

    Example:

    A drug manufacturer has found 2% of the patients taking a particular drug will experience a particular

    side effect.

    A hospital consultant prescribes the drug to 150 of her patients.

    Using a suitable approximation calculate the probability that:

    a) None of her patients suffer from the side effects.b) No more than 5 suffer from the side effects.

    LetXrepresent the number of patients experiencing side effects.

    The exact distribution ofXisX~ B(150, 0.02).

    Since n is large and p is small, X Po(150 x 0.02)

    So, X Po(3).

    (tables)

    (tables)

  • 8/7/2019 Statistics 2[1]- Notes

    6/29

    6

    Continuous random variables

    A probability density function (p.d.f.) is a curve that models the shape of the distribution corresponding

    to a continuous random variable.

    If is the p.d.f corresponding to a continuous random variableXand if is defined

    then the following properties must hold

    1. The total area under a p.d.f. is 1.

    2. The graph of the p.d.f never dips below the x-axis.

    for

    3. Probabilities correspond to the area under the curve.

  • 8/7/2019 Statistics 2[1]- Notes

    7/29

    7

    Mode

    Suppose that a random variableXis defined by the probability density function for .

    The mode ofXis the value of that produces the largest value for in the interval .

    A sketch of the probability density function can be very helpful when determining the mode.

    Example:

    A random variableXhas p.d.f. , where

    Find the mode.

    The mode can be found using differentiation:

    Differentiation could

    be used to find the

    mode here.

  • 8/7/2019 Statistics 2[1]- Notes

    8/29

    8

    To find the turning point we solve .

    or

    if the point is maximum.

    So the mode is

    Cumulative distribution functions

    The c.d.f. is found by integrating the p.d.f.

    Example:

    A random variableXhas a p.d.f , where

    Find the c.d.f and find P(X < 1).

  • 8/7/2019 Statistics 2[1]- Notes

    9/29

    9

    F( )

    x

    x x x x

    x

    41 1

    24 6

    0 0

    0 2

    1 2

    =

    Median and Quartiles

    The median of a random variableXis defined to be the value such that

    where F is the cumulative distribution ofX.

    Likewise the lower quartile is the solution to the equation

    and the upper quartile is the solution to

    Example :

    A random variableXis defined by the cumulative distribution function:

    ( )F( )

    x

    x x x x

    x

    21

    24

    0 2

    6 2 5

    1 5

    a) Calculate and sketch the probability density function.b) Find the median value.c) Work out

    The p.d.f. is found by differentiating the c.d.f.

    Sketch of

  • 8/7/2019 Statistics 2[1]- Notes

    10/29

    10

    Median

    Therefore

    or

    must be since it lies in the interval [2,5]

    -

    =

    Expectation

    IfXis a continuous random variable defined by the probability density function over the domain

    , then the mean or expectation ofXis given by

  • 8/7/2019 Statistics 2[1]- Notes

    11/29

    11

    Note : If the p.d.f is symmetrical, then the expected value ofXwill be the value corresponding to the line

    of symmetry.

    Example :

    A random variableXis defined by the probability density function

    Calculate the E[X] and E[1/X]

    Variance

    IfXis a continuous random variable defined by the probability density function over the domain

    then the variance ofXis given by

    where

    Example :

    A continuous random variable Yhas a probability density function where

    Calculate the value of Var[Y].

  • 8/7/2019 Statistics 2[1]- Notes

    12/29

    12

    Sketch of

    The p.d.f. is symmetrical. Therefore .

    Examination-style question :

    The mass,Xkg, of luggage taken on board an aircraft by a passenger can be modeled by the probability

    density function

    a) Sketch the probability density function and find the value ofk.b) Verify that the median weight of luggage is about 20.586 kg.c) Find the mean and variance ofX.

  • 8/7/2019 Statistics 2[1]- Notes

    13/29

    13

    To find kwe use

    To verify that the median is about 20.586, we need to check that

    Therefore Var[X] = 428.5714 - 20 = 28.6 )

  • 8/7/2019 Statistics 2[1]- Notes

    14/29

    14

    Continuous Uniform Distribution

    A random variableXis said to have a continuous uniform distribution (or rectangular distribution) over

    the interval [a,b] if its probability density function has the form :

    The graph of the p.d.f. is as follows:

    IfXhas a continuous uniform distribution over the interval [a,b], then

    Example :

    A random variable Y has a continuous uniform distribution in the interval [2,8]. Find .

  • 8/7/2019 Statistics 2[1]- Notes

    15/29

    15

    Examination-style question:

    A random variableXis given by the probability density function , where

    Find:

    a) E[X] and Var[x]b)

    Xhas a uniform distribution over the interval (5,15).

    The p.d.f. for X is shown on the diagram below.

    The probability we require is shaded.

    So,

  • 8/7/2019 Statistics 2[1]- Notes

    16/29

    16

    If X has a uniform distribution over the interval (a,b) then the cumulative distribution function ofXis :

    ( ) ( )

    x a

    x aF x X x a x b

    b a

    x b

    < = =

    >

    0

    P

    1

    Approximating a binomial using a normal

    Calculating probabilities using the binomial distribution can be cumbersome if the number of trials (n) is

    large.

    Consider this example:

    10% of people in the United Kingdom are left handed.

    A school has 1200 students. Find the probability that more than 140 of them are left handed.

    Let the number of left-handed people in the school beX.

    Then X ~ B[1200, 0.1].

    The required probability is

    P(X > 140) = P(X = 141) + P(X = 142) + + P(X = 1200)

    As no tables exist for this distribution, calculating this probability by hand would be a mammoth task.

    A further problem arises if you attempt to work one of these probabilities, for example P(X = 141):

    P(X = 141) = C X 0.1 X 0.9

    Calculators cannot calculate

    the value of this coefficient

    it is too large!

  • 8/7/2019 Statistics 2[1]- Notes

    17/29

    17

    One way forward is to approximate the binomial distribution using a normal distribution.

    IfX ~ B(n,p) where n is large and p is small, then X can be reasonably approximated using a normal

    distribution :

    where

    There is a widely used rule of thumb that can be applied to tell you when the approximation will be

    reasonable:

    Continuity Correction

    A continuity correction must be applies when approximating a discrete distribution (such as binomial) to

    a continuous distribution (such as normal distribution).

    Exact distribution: B(n,p) Approximate distribution: N[np, npq]

    Introductory example:

    10% of people in the United Kingdom are left handed. A school has 1200 students. Find the probability

    that more than 140 of them are left handed.

    Let the number of left-handed people in the school beX.

    Then X ~ B[1200, 0.1].

    Since np = 120 > 5 and nq = 1080 > 5 we can approximate the distribution using a normal distribution:

    So P(X > 140) P(X 140.5) (Using Continuity Correction)

    Standardize = = 1.973

    Therefore P(X 140.5) = P(Z 1.973)

    A binomial distribution can by approximated

    reasonably well by a normal distribution

    provided that np > 5 and nq > 5

  • 8/7/2019 Statistics 2[1]- Notes

    18/29

    18

    = 1- P(Z 1.973) = 1- 0.9758

    = 0.0242

    Examination-style question:

    A sweet manufacturer makes sweets in 5 colours. 25% of the sweets it produces are red.

    The company sells its sweets in tubes and in bags. There are 10 sweets in a tube and 28 sweets in a bag.

    It can be assumed that the sweets are of random colours.

    a) Find the probability that there are more than 4 red sweets in a tube.b) Using a suitable approximation, find the probability that a bag of sweets contains between 5 and

    12 red sweets (inclusive).

    Let the number of red sweets in a tube be X.

    Then the exact distribution for X is X ~ B[10, 0.25].

    P (X > 4) = 1 P(X 4)

    = 1 0.9219

    = 0.0781

    Let the number of red sweets in a bag be Y.

    Then the exact distribution for Yis Y ~ B[28, 0.25].

    The distribution can be approximated by a normal since np = 7 and nq = 21 (both greater than 5) :

    Y N[7, 5.25]

    P (5 Y 12) P(4.5 Y 12.5) (Using Continuity Correction)

    Standardize : = -1.091

    P(-1.091 Z 2.400)

    = P(Z 2.400) - P(Z -1.091)

    = P(Z 2.400) (1- P(Z 1.091)

    = 0.9918 (1-0.8623) = 0.8541

  • 8/7/2019 Statistics 2[1]- Notes

    19/29

    19

    Approximating the Poisson using a normal

    If and is large, thenXis approximately normally distributed:

    Recall that the mean and variance of a Poisson distribution are equal.

    There is a widely used rule of thumb that can be applied to tell you when the approximation will be

    reasonable:

    Note: A continuity correction is required because we approximating a discrete distribution using a

    continuous one.

    Examination-style question:

    An electrical retailer has estimated that he sells a mean number of 5 digital radios each week.

    a) Assuming that the number of digital radios sold on any week can be modelled by a Poissondistribution find the probability that the retailer sells fewer than 2 digital radios on a randomly

    chosen week.

    b) Use a suitable approximation to decide how many digital radios he should have in order for himto be at least 90% certain of being able to meet the demand for radios over the next 5 weeks.

    LetXrepresent the number of digital radios sold in a week.

    So .

    P( X < 2) = P( X 1)

    = 0.0404

    Let Yrepresent the number of digital radios sold in a period of 5 weeks.

    A Poisson can be approximated

    reasonably well by a normal

    distribution provided .

  • 8/7/2019 Statistics 2[1]- Notes

    20/29

    20

    P( Y y ) = 0.9

    P( Y y + 0.5) (Using Continuity Correction)

    So,

    So the retailer would need to keep 31 digital radios in stock.

    The 10% point of

    a normal is 1.282.

  • 8/7/2019 Statistics 2[1]- Notes

    21/29

    21

    Statistic is a quantity calculated solely from

    the observations in a sample.

    Populations and samples

    Examples:

    A head teacher is interested in finding out how long her sixth form students spend in part-time

    employment per week.

    Population is the set of all sixth form students in her school.

    Sampling frame - would be the registers of sixth form tutor groups.

    Carrying out a census of the entire population is usually not feasible or sensible.

    Advantages of taking a census are:

    Every single member of the population is used Unbiased Gives an accurate answer

    Population is the set of all individuals

    or objects that we wish to study.

    Census is an investigation in which information

    is obtained from every member of the population.

    Sampling frame is a list of all memberof the population.

    Sampling unit is an individual member of

    a population.

    Sample is a selection of individual members

    or items from a population.

  • 8/7/2019 Statistics 2[1]- Notes

    22/29

    22

    Disadvantages of taking a census are:

    Money Time Resources

    Instead of surveying the whole population, information can instead be obtained from a sample.

    The sampling process should be undertaken carefully to ensure that the sample is representative of the

    entire population.

    Bias can occur if one section of the population is over/under represented.

    A simple random sample of size n consists of the observationX,X,,Xnfrom a population whereXi

    are Independent random variables. have the same distribution as the population.

    Example :

    A large bag of coins contains 1p, 2p and 5p coins in the ratio 2:1:3.

    a) Find the mean, , and the variance, 2, for the population of coins.b) A random sample of 3 coins is taken from this population. List all the possible outcomes.

    LetXbe the value of the coin chosen.

    Distribution of the population:

    1 2 5

    Random sample if every member in the sample size

    has the same probability of being chosen.

  • 8/7/2019 Statistics 2[1]- Notes

    23/29

    23

    The possible outcomes and the mean:

    (1,1,1) 1

    (1,1,2) (1,2,1) (2,1,1) 4/3

    (2,2,1) (2,1,2) (1,2,2) 5/3

    (2,2,2) 2

    (1,1,5) (1,5,1) (5,1,1) 7/3

    (5,5,1) (5,1,5) (1,5,5) 11/3

    (5,5,5) 5

    (2,2,5) (2,5,2) (5,2,2) 3

    (5,5,2) (5,2,5) (2,5,5) 4

    (1,2,5) (1,5,2) (2,1,5) (2,5,1) (5,1,2) (5,2,1) 8/3

    Working out

    e.g.

    (1,1,2) = 4/3

    The sampling distribution is :

    1 4/3 5/3 2 7/3 8/3 3 11/5 4 5

    1/27 1/18 1/36 1/216 1/6 1/6 1/24 1/4 1/8 1/8

    Times by 3: Since 3 different combinations.

  • 8/7/2019 Statistics 2[1]- Notes

    24/29

    24

    Hypothesis Testing

    Null Hypothesis (H0) is the hypothesis we assume to be correct unless proved otherwise.

    Alternative Hypothesis (H1) tells us whether the assumption is wrong or not.

    Steps required to answer Hypothesis Test questions in an examination are:

    Hypothesis Testing for the Binomial Distribution

    Lower One Tail Test

    Example:

    Is a normal six sided die fair when 1 six is thrown in 24 throws?

    Test at the 5% level of significance.

    LetXbe the random variable the number of 6s thrown in 24 throws.

    Therefore X ~ B[24, ]

    Step 1: Write out H0 and H1 in mathematical terms.

    Step 2: State the significance level if none is mentioned in the question, it is usual

    to choose 5%.

    Step 3: State the distribution, assuming the null hypothesis to be true.

    Step 4:Calculate the probability (under H0) of obtaining results as extreme as those

    collected.

    Step 5: Compare the probability with the significance level and make conclusions

    can H0 be rejected or not? Interpret your results in context.

  • 8/7/2019 Statistics 2[1]- Notes

    25/29

    25

    H0 =

    H1 0.05

    Accept H0 : evidence to suggest that the die is fair.

    Upper One Tail Test

    Example:

    In Luigi's restaurant, on average 1 in 10 people order a bottle of Chardonnay. Out of a sample of 50, 11

    chose Chardonnay. Has the drink become more popular?

    Test at the 1% level of significance.

    LetXbe the random variable the number of people ordering a bottle of Chardonnay in a sample of 50.

    X ~ B[50, 0.1]

    H0 = 0.1

    H1 > 0.1

    Reject H0 if: P(X11) 0.01

    P(X11) = 1 - P(X10)

    = 1 0.9906 (Using tables)

    = 0.0094 < 0.01

    Reject H0 : since evidence to suggest the number of people ordering Chardonnay has increased at the

    1% level of significance.

  • 8/7/2019 Statistics 2[1]- Notes

    26/29

    26

    Critical Values Method

    Example 1:

    A manufacturer claims that 2 out of 5 people prefer Soapy Suds washing powder over any other brand.

    For a sample of 25 people, only 4 people are found to prefer Soapy Suds. Is the manufacturers claim

    justified?

    Test at the 5% level of significance.

    LetXbe the random variable the number of people who prefer soapy suds.

    X ~ B[25, 0.4]

    H0 = 0.4

    H1 < 0.4

    Reject H0 if: P(Xxc) 0.05

    From tables: xc = 5

    Since x=4 < critical value.

    Reject H0 : since evidence to suggest that the manufacturers claim is false and it is less than 2 in 5 at the

    5% level of significance.

    Example 2:

    A particular drug has a 1 in 4 chance of curing a certain disease. A new drug is developed to cure the

    disease. How many people would need to be cured in a sample of 20 if the new drug was to be deemed

    more successful at curing the disease than the old drug to obtain a significant result at the 5% level?

    LetXbe the random variable the number of people who are cured by the new drug.

    X ~ B[20, 0.25]

    H0 = 0.25

    H1 > 0.25

    Reject H0 if: P(Xxc) 0.05 ; xc = critical value

    1 - P(Xxc - 1) 0.05

  • 8/7/2019 Statistics 2[1]- Notes

    27/29

    27

    P(Xxc - 1) 0.95

    xc 1 9

    xc 10

    So 10 or more people are required to be cured to obtain significant evidence that the new drug is better

    at curing the disease.

    Two Tail Test

    Example:

    A person suggests that the proportion, p of red cars on a road is 0.3. In a random sample of 15 cars it is

    desired to test the null hypothesis p = 0.3 against p 0.3 at a nominal significance level of 10%.

    Determine the appropriate acceptance region and the corresponding actual significance level.

    LetXbe the random variable the number of red cars in a sample of 15.

    X ~ B[15, 0.3]

    H0 = 0.3

    H1 0.3

    5% level of significance for each tail.

    Reject H0 if: P(Xxl) 0.05; xl = lower critical value

    From tables: xl = 1

    Reject H0 if: P(Xxu) 0.05; xu = upper critical value

    1 - P(Xxu - 1) 0.05

    P(Xxu - 1) 0.95

    From tables: xu 1 = 7

    Therefore xu = 8

    H0 rejection region: x 1 or x 8.

    Actual significance level: P(x 1) + P(x 8)

    0.0353 + 0.05 = 0.0853 = 8.53%

  • 8/7/2019 Statistics 2[1]- Notes

    28/29

    28

    Hypothesis Testing for the Poisson Distribution

    Lower One Tail Test

    Example:

    The number of car accidents along a certain stretch of road occurred at an average rate of 5 per week.

    After the introduction of speed cameras the number of accidents in one week is 2. Assuming that the

    number of accidents can be modeled as a Poisson distribution, test at the 5% nominal significance level

    if the has been in a reduction in the number of accidents.

    LetXbe the random variable the number of accidents in a week.

    X ~ Po[5]

    H0 = 5

    H1 < 5

    Reject H0 if: P(Xxl) 0.05

    From tables: xl = 1

    Since x = 2 > Lower Critical Value.

    Accept H0:since there is insufficient evidence to the claim that the number of accidents has reduced at

    the nominal 5% significance level.

    Upper One Tail Test

    Example:

    A shop sells a particular make of radio at a rate of 4 per week on average. The shop places an advert in

    the local paper in the hope of raising sales. In the week that the advert was placed the number of sales

    was 10. Is there significant evidence that the sales have increased? Test at the 5% nominal level of

    significance.

    LetXbe the random variable the number of radios sold per week.

    X ~ Po[4]

    H0 = 4

    H1 > 4

    Reject H0 if: P(Xxu) 0.05

    1 - P(Xxu - 1) 0.05

  • 8/7/2019 Statistics 2[1]- Notes

    29/29

    P(Xxu - 1) 0.95

    From tables:

    xu 1 = 8

    xu = 9

    x = 10 > Upper Critical Value.

    Reject H0: Since evidence to suggest that the number of radios sold has increased at the 5% level of

    significance.

    Two Tail Test

    Example:

    A machine produces glass sheets. The number or bubbles seen per square metre in the glass sheet

    follows a Poisson distribution with mean 3. Find the lower and upper critical values for a nominal 10%

    significance level test for the mean not equal to 3 and the actual significance level of the test.

    LetXbe the random variable the number of bubbles per m2.

    X ~ Po[3]

    H0 = 3

    H1 3

    5% level of significance for each tail.

    Reject H0 if: P(Xxl) 0.05; xl = lower critical value

    From tables: xl = 0

    Reject H0 if: P(Xxu) 0.05;xu = upper critical value

    1 - P(Xxu - 1) 0.05

    P(Xxu - 1) 0.95

    From tables: xu 1 = 6

    xu = 7

    Actual significance level: P(X0) + P(X7)

    = 0.0498 + ( 1 0.9665) = 0.0833 = 8.33%