PS - Handbook.pdf

Embed Size (px)

Citation preview

  • 8/19/2019 PS - Handbook.pdf




    Lecture 1:  Exploratory Data Analysis And Descriptive Statistics 11.1  Studying one variable at a time 11.2  Studying two variables at a time 61.3  Studying more than two variables at a time 6

    1.4  Effect of Transformation 61.5  Percentiles 61.6  Exercises 8

    Lecture 2:  Probability 112.1  Probability 132.2  Operations with Probability 142.3  Additive Rules of Probability 152.4  Complement of Event A 152.5  Conditional Probability 162.6  Independent Events 17

    2.7  Intersection of events A and B 172.8  Bayes’ Rule  192.9  Exercises 21

    Lecture 3:  Discrete Random variables 253.1  Introduction 253.2  Bernoulli Distribution 273.3  Binomial Distribution 273.4  Poisson Distribution 303.5  Exercises 32

    Lecture 4:  Continuous Random variables 334.1  Introduction 334.2  Exponential Distribution 364.3  Exercises 38

    Lecture 5:   Normal Distribution 395.1  Introduction 395.2   Normal as an approximating distribution 405.3  Exercises 41

    Lecture 6: 

    Random Sampling and sampling distributions 436.1  Introduction 436.2  Exercises 46

    Lecture 7:  Test of Hypotheses 487.1  Introduction to Hypothesis Testing 487.2  Procedure 487.3  Confidence intervals for hypothesis testing 507.4  Proportions 517.5  Sample size 537.6  Hypothesis Test for Proportions 53

    7.7  Exercises 54

  • 8/19/2019 PS - Handbook.pdf



    Lecture 8:  Type I and II Errors 568.1  Introduction 568.2  Type I and II Errors 568.3  Exercises 56

    Lecture 9:  Further Hypothesis Tests 58

    9.1  Introduction 589.2  Comparison of two population means 589.3  The difference between two proportions 619.4  Paired samples 639.5  Exercises 65

    Lecture 10: Inference for Variance 6710.1  Introduction 67

    10.2  Confidence Interval for 2 6710.3  Confidence Interval for the Ratio of Two Variances 6810.4  Significance Test of Hypotheses about a Variance 68

    10.5  Significance Test of Hypotheses about Two Variances 6910.6  Exercises 70

    Lecture 11: Chi-squared Test 7211.1  Goodness-of-fit Test 7211.2  Test for Homogeneity 7411.3  Continuity Correction 7711.4  Exercises 78

    Lecture 12: Regression Analysis 8012.1  Introduction 8012.2  Correlation 8212.3  Regression 8312.4  Exercises 87

    Lab Assessment one 88

    Lab Assessment two 90

    Lab Assessment three 93

    Lab Assessment four 94

    Lab Assessment five 96

    Lab Assessment six 98

    Lab Assessment seven 99Lab Assessment eight 101

    Lab Assessment nine 104

    Lab Assessment ten 106

  • 8/19/2019 PS - Handbook.pdf





    Statistics can be divided into two major areas. Descriptive statistics comprises the statistical methods

    dealing with the collection, tabulation and summarization of data, so as to present meaningful

    information. Statistical inference, on the other hand, consists of the methods involved with the

    analysis and interpretation of data that will enable the statistician to develop meaningful inferences

    about the data. Both sub fields are interrelated; while descriptive statistics organizes the collected data

    in a systematic manner, statistical inference analyses the data and enables one to produce significant

    inferences about it.

    A population  is the totality of the observations with which a statistician is concerned. The

    observations could refer to anything of interest, such as persons, animals or objects; it need not belimited to people. The size of the population is defined to be the number of observations in the

     population. In collecting data concerning a population, the statistician is often interested in arriving at

    conclusions involving the entirety of the population.

    A sample  is a subset of a population. A random sample of n observations is a sample with n

    observations, selected in such a way that every such sample of the population has the same probability

    of being selected. These samples are considered to be unbiased.

    Often, a sample of the population is taken, data collected from it, and inferences about the population

    are made based on the analysis of the sample data.

    1.1 Studying one variable at a time

      A stem-and-leaf plot  is a graphical display showing the frequency of values in specifiedintervals. It is useful for small amounts of data as it retains the actual numerical values.


    Stem Leaf

    1  3456662  0000112233453  22334444484  1112345675  223367896  2697  458  9

    The stem is an integer and the leaf is a decimal value.

      A histogram  is a graphical way to display the shape of the distribution.

  • 8/19/2019 PS - Handbook.pdf



      A box-plot  is a graphical summary of the distribution of a variable. The minimum, the 1stquartile, the median, the 3rd quartile and the maximum are used to construct a box-plot.

    This is called the five-number summary.

    1. The ends of the box are at the quartiles.

    2. Mark the median with a line.

    3. Observations more than 1.5 * IQR outside the box are considered to be outliers and

    are marked with stars.

    4. Whiskers extend from the ends of the box to the smallest and largest observations

    that are not outliers.


    The statistical mean of a set of observations is the average of the measurements in a set of data. The

     population mean and sample mean are defined as follows:

    Class mid points

           F      r      e       q        u       e       n      c       y  







    Mean 25.74

    StDev 2.389

    N 250

    Histogram of BMI









    Box plot of BMI

  • 8/19/2019 PS - Handbook.pdf



    Given the set of data values , , . . . .,  from a finite population of size , the population mean  is calculated as


    Given the set of data values , , . . . .,  from a sample of size , the sample̅  1

    The sample mean is often used as an estimator of the mean of the population from whence the sample

    was taken. In fact, the sample mean is statistically proven to be a most effective estimator for the

     population mean.

    A tr immed mean of a set of values is a mean with a specified percentage of the largest and smallestvalues excluded from the calculation.


    The median of a set of observations is that value that, when the observations are arranged in an

    ascending or descending order, satisfies the following condition:

    1.  If the number of observations is odd, the median is the middle value.2.  If the number of observations is even, the median is the average of the two middle values.

    The median is the same as the 50th percentile of a set of data.


    The mode of a set of observations is the specific value that occurs with the greatest frequency. There

    may be more than one mode in a set of observations, if there are several values that all occur with the

    greatest frequency. A mode may also not exist; this is true if all the observations occur with the same


    Another measure of central location that is occasionally used is the midrange. It is computed as the

    average of the smallest and largest values in a set of data.

    Example 1.1: Given the following set of data

    1.2 1.5 2.6 3.8 2.4 1.9 3.5 2.5 2.4 3.0

    It can be sorted in ascending order:

    1.2 1.5 1.9 2.4 2.4 2.5 2.6 3.0 3.5 3.8

    The mean, median and mode are computed as follows:

  • 8/19/2019 PS - Handbook.pdf



     x   =10    

    = 2.48

     x~   = (2.4 + 2.5) / 2

    = 2.45

    The mode is 2.4, since it is the only value that occurs twice.

    The midrange is (1.2 + 3.8) / 2 = 2.5.

     Note that the mean, median and mode of this set of data are very close to each other. This suggests

    that the data is very symmetrically distributed.


    The range of a set of observations is the absolute value of the difference between the largest and

    smallest values in the set. It measures the size of the smallest contiguous interval of real numbers that

    encompasses all the data values.

    Example 1.2: Given the following sorted data:

    1.2 1.5 1.9 2.4 2.4 2.5 2.6 3.0 3.5 3.8

    The range of this set of data is 3.8 - 1.2 = 2.6.

      Variance and Standard Deviation

    The variance of a set of data is a cumulative measure of the squares of the difference of all the data

    values from the mean.

    The population and sample variance are calculated as follows:

    Given the set of data values , , . . . .,  from a finite population of size , the population varianceis calculated as,

       1 (  )

    Given the set of data values , , . . . .,   from a sample of size , the sample variance   iscalculated as,

        1 1 (  ̅)

  • 8/19/2019 PS - Handbook.pdf



     Note that the population variance is simply the arithmetic mean of the squares of the difference

     between each data value in the population and the mean. On the other hand, the formula for the sample

    variance is similar to the formula for the population variance, except that the denominator in the

    fraction is ( 1 ) instead of  . Using the above formula, the sample variance is statistically provento be a most effective estimator for the variance of the population to which the sample belongs.

    The standard deviation of a set of data is the positive square root of the variance.

    Example 1.3: Given the following sorted data:

    1.2 1.5 1.9 2.4 2.4 2.5 2.6 3.0 3.5 3.8

     x  = 2.48 as computed earlier

    2 s  =



      ((1.2 - 2.48) 2 + (1.5 - 2.48) 2 + (1.9 - 2.48) 2 + (2.4 - 2.48)2 

    + (2.4 - 2.48) 2 + (2.5 - 2.48)2 + (2.6 - 2.48) 2 + (3.0 - 2.48) 2 

    + (3.5 - 2.48)2 + (3.8 - 2.48)2)

    = (1 / 9) × (1.6384 + 0.9604 + 0.3364 + 0.0064 + 0.0064 + 0.0004 + 0.0144 + 0.2704 + 1.0404

    + 1.7424)

    = 0.6684

     s  = (0.6684) 1/2 = 0.8176

    The sample variance can also be calculated as follows:









    1   n





    i   x xnnn


    Example 1.4: Given the above data, we can calculate s using the above formula:



    i x1

    2  = 2222222222    

    = 1.44 + 2.25 + 3.61 + 5.76 + 5.76 + 6.25 + 6.76 + 9.00 + 12.25

    + 14.44= 67.52

    2 s   =910


    × (10 × 67.52-   28.24  )

    = 0.6684

    1.2 Studying two variables at a time

      A two-way frequency table gives the number of cases within each combination ofcategories of two qualitative variables. 

      A Scatter plot is a two-dimensional graphical display of two quantitative variables.  

  • 8/19/2019 PS - Handbook.pdf



    1.3 Studying more than two variable at a time

      A multiway frequency table or multidimensional contingency table displays the number

    of cases within each combination of categories of several qualitative variables.

    1.4 Effect of Transformation

    A transformation of a variable is a mathematical manipulation of each value of the variable. When

    we make a transformation, we transform the original scale of measurement for the variable to a new


    Many statistical techniques require that the data is approximately normally distributed so we often

    apply a transformation to the data. If the data is skewed to the right, we can try the natural logarithms

    or the square root. If the data is skewed to the left, we can try a power transformation greater than one.

    All these transformations are non-linear.

  • 8/19/2019 PS - Handbook.pdf



    1.5 Percentiles

    Percentiles are values in a given set of observations that divide the data into 100 equal parts. These

    values can be denoted by , , . . . . . ,  where1 % of the data falls below (is less than or equal to) P 12 % of the data falls below P2 



    99 % of the data falls below P99 

    Percentiles can be calculated using a sorted list of observations or the cumulative frequency

    distribution table corresponding to the observations. In the latter method, it is assumed that the values

    in a class interval are uniformly distributed within it; extrapolation is then used to calculate the

     percentiles. As this assumption is often untrue, percentile values can differ depending on whether raw

    data or frequency distributions were used in the computation. Therefore, percentiles are often treated

    as estimates for the value below which certain percentages of the observations fall.

    Example 1.1: Given the following sorted list of observations:

    0.7 0.8 0.9 1.1 1.2 1.4 1.9 2.2 2.2 2.3

    2.5 3.1 3.2 3.3 3.4 3.8 3.9 4.0 4.1 4.2

    4.3 4.6 4.7 5.0 5.2 5.5 5.6 5.8 5.9 6.1

    6.4 6.6 6.8 7.0 7.7 8.2 8.9 9.2 9.5 9.9

    P75  = 6.1, since 40 x 75 % = 30 and 6.1 is the 30th ranked value.

    P45 = 4.0, since 40 x 45 % = 18 and 4.0 is the 18th ranked value.

    P62  = 5.2, since 40 x 62 % = 24.8 and 5.2 is the 25th ranked value.

    This set of observations has the following cumulative frequency distribution:

    Measurements Cumulative Frequency Relative Cumulative Frequency

    0.0 - 1.0 3 0.075

    1.0 - 2.0 7 0.175

    2.0 - 3.0 11 0.275

    3.0 - 4.0 18 0.450

    4.0 - 5.0 24 0.600

    5.0 - 6.0 29 0.725

    6.0 - 7.0 34 0.8507.0 - 8.0 35 0.875

    8.0 - 9.0 37 0.925

    9.0 - 10.0 40 1.000

    Totals 40 1.000

    The percentiles can also be calculated from the cumulative frequency distribution table, using

    extrapolation to arrive at estimates:

      6.0 + 1.0 ∗(0.750.725)(0.850.725)  6.0 +

    0.0250.125  6.2


  • 8/19/2019 PS - Handbook.pdf



    where 6.0 is the upper class limit of interval 5.0 - 6.0 with cumulative frequency 0.725, and 0.850 is

    the cumulative frequency of the next interval, 6.0 - 7.0, with class width 1.0.

      4.0 , since the interval 3.0 - 4.0 has a cumulative frequency of 0.45  5.0 + 1.0 ∗ (0.6200.600)(0.7250.600)  5.0 +

    0.0200.125  5.16 

    where 5.0 is the upper class limit of interval 4.0 - 5.0 with cumulative frequency 0.600, and 0.725 is

    the cumulative frequency of the next interval, 5.0 - 6.0, with class width 1.0.

    The values of P75 and P62 differ  between the two methods of calculation, while the values of P45 for both

    methods are the same.

    Deciles are values in a given set of observations that divide the data into 10 equal parts. These values

    can be denoted by , , . . . . . , , where10 % of the data falls below D1 

    20 % of the data falls below D2 



    90 % of the data falls below D9 

    It is easy to see that

    D1 = P10  D4  = P40  D7 = P70 

    D2  = P20  D5 = P50  D8  = P80 

    D3  = P30 D6  = P60  D9  = P90 

    Quartiles are values in a given set of observations that divide the data in 4 equal parts. These values

    can be denoted by Q1, Q2 and Q3, where

    25 % of the data falls below Q1 

    50 % of the data falls below Q2 

    75 % of the data falls below Q3 

    Again, it is obvious that   ,    and   .Deciles and quartiles are calculated in the same manner as percentiles.

    1.6 Exercises

    1.  Construct a box-plot using the five-number summary, minimum, Q1, median, Q3, maximum,given as 48, 63, 70, 81 and 100 respectively.

    2.  Consider the following strength measurements.

    66 117 132 111 107 85 89 79 91 97 138 103

      111 86 78 96 93 101 102 110 95 96 88 122 115  92 137 91 84 96 97 100 105 104 137 80 104

  • 8/19/2019 PS - Handbook.pdf


  • 8/19/2019 PS - Handbook.pdf



    In 1984-1985, 482,528 men and 496,949 women received bachelor’s degrees 143,390 men

    and 142,861 women received master’s degrees, 21,700 men and 11,243 women received

    doctorates, and 50,455 men and 24,608 women received first professional degrees.

    (a) Arrange this information in one or more frequency tables.

    (b) Discuss the relationship between sex and degree, separately for the two academic years.(c) Discuss the relationship between year and degree, separately for men and women.

    1.7  Further Exercises (Probability and statistics: Walpole, Myers and Myers –  8th edition)Exercises - Page 52 - Question 1.13 & 1.14

  • 8/19/2019 PS - Handbook.pdf





    So far we have used tools of data analysis to learn about a collection of information. In formal statisticalanalysis, we go beyond the goals of data analysis. In general, statistical analysis (inference) involves

    making probability statements about populations based on what we observe in our samples. The ideas

    in probability that are needed for formal statistical inference are discussed in this lecture.

    Statisticians use the word experiment to describe any process that generates a set of data.

    An experiment is a process leading to a well-defined observation or outcome that generates a set of


    A simple example of a statistical experiment is the tossing of a coin. In this experiment there are only

    two possible outcomes, heads and tails.We are particularly interested in the observations obtained by repeating the experiment several times

    under the same conditions. In most cases the outcomes will depend on chance and, therefore, cannot

     be predicted with certainty. When a coin is tossed repeatedly, we cannot be certain that a given toss

    will result in head. However we know the entire set of possibilities for each toss.

    The sample space is the set of all possible outcomes of the experiment and is denoted by S. Each of

    the possible outcomes is called an element or a member of the sample space , or simply a sample point.

    If the sample space has finite number of elements we can list them as follows.

    The sample space of S, of possible outcomes when a coin is tossed, may be written as

    S = {H, T}Where H and T corresponds to “heads” and “Tails”. 

    In some experiment it is helpful to list the elements of the sample space systematically by means of a

    tree diagram. 

    Sample spaces with large or infinite number of sample points are best described by a  statement or a

    rule. For example, if the possible outcomes of an experiment are the set of cities in the world with a

     population over 1 million, the sample space is written

    S = { x | x is a city with a population over 1 million}

    A finite sample space is a sample space that contains a finite number of outcomes.

    The sample spaces that contain the outcomes of tossing a coin, drawings from a bag of mixed-colour

     balls, and dealings from a regular 52-card deck are examples of discrete sample spaces.

    A continuous sample space is a sample space that contains an interval of values.

    Sample spaces that contain the outcomes of temperature readings, height measurements, and salaries

    are examples of continuous sample spaces.

    An event is a subset of the sample space and is denoted by E.

    It may contain some, all or none of the outcomes comprising the sample space. If the event contains

    only one sample point, it is a  simple event . If the event contains two or more sample points, it is a

    compound event . And if the event contains no sample points, it is known as a null space.

  • 8/19/2019 PS - Handbook.pdf



    For any given experiment we may interested in occurrence of certain events rather than in the outcome

    of a specific element in the sample space. For example we may interest in the event A that the outcome

    when a die is tossed is divisible by 3. This will occur if the outcome is an element of the subset A =

    {3,6} of the sample space S = {1,2,3,4,5,6} of tossing a die experiment.

    To each event we assign a collection of sample points, which constitute a subset of the sample space.That subset represents all of the elements for which the element is true.

    The complement of an event A with respect to S is the subset of all the elements of s that are not in

    A. we denote the complement of a by the symbol  A .

    For example consider the sample space

    S = {A, B, C, D, E}. Let A = {B, D}. Then  A ={A, C, E}.

    The intersection of two events A and B, denoted by the symbol   B A   is the event containing allelements that are common to A and B.

    In the tossing of a die we might let A be the event that an even number occurs and  B the event that a

    number greater than 3 shows. Then the subsets  A = {2,4,6} and  B ={4,5,6} are subsets of the same

    sample space S={1,2,3,4,5,6}. Both A and B will occur on a given toss if the outcome is an element of

    the subset {4,6}, which is the intersection of A and B. So  B A  = {4,6}

    For certain statistical experiments it is usual to define two events that cannot occur simultaneously.

    Such events are said to be mutually exclusive.

    Two events  A and  B  are mutually exclusive, or disjoint if  B A , that is, if  A and  B have noelements in common.

    The Union of the two events A and B, denoted by the symbol  B A , is the event containing all theelements that belong to A or B or both.


    Consider tossing a die and observing the number that appears on top face. This has a well-defined

    outcome that is top face can be 1,2 3, 4, 5 or 6. So this can be taken as an experiment .The sample space S of the experiment is S = {1,2 ,3,4,5,6}.

    S consists of 6 definite outcomes. So S is a finite sample space.

    Some events on this sample space can be identified as even number occurs, odd number occurs and

    number greater than 3 occurs.

    Let  A be the event that an even number occurs,  B that an odd number occurs, and C that a numbergreater than 3 occurs. Then

    Throwing the die

    * * * * * *

    | | | | | |

    1 2 3 4 5 6

  • 8/19/2019 PS - Handbook.pdf



    A = {2,4,6}

    B = {1,3,5}

    C = {4,5,6}

    C  B = {1,3,4,5,6}C  B  = {5}

     B A  = {}, So A and B are mutually exclusive.

    2.1 Probability

    The probability of an event is the chance or likelihood of the event occurring.

    In this chapter we consider only those experiments for which the sample space contains a finite number

    of elements. The probability of an outcome or sample point is a real number, between 0 and 1 that

     provides a measure of likelihood that the outcome or sample point will actually occur. A sample point

    that absolutely cannot occur has a probability of 0, while a sample point that will always occur has a

     probability of 1; all other sample points are assigned a probability based on this relative measure.

    A probability function assigns a unique number or probability to each outcome.

    The probability of an event A is the summation of the probabilities of all the sample points in A and

    is denoted by P(A).

    If event A is a subset of the sample space S, then 0   P(A)  1. If A =  , then P(A) = P( ) =

    0; if A = S, then P(A) = P(S) = 1. Otherwise, the value of P(A) is between 0 and 1.

    Example2.2: A coin is tossed twice. What is the probability that at least one head occurs?


    The sample space for this experiment is S = {HH, HT, TH, TT}. If the coin is balanced each of these

    outcomes would be equally likely to occur. Therefore, we assign a probability of w to each sample

     point. Then 4w = 1,or w =1/4.If A represents the event of at least one head occurring, then

    A = {HH,HT,TH} and4







    1)(    A P   

    If an experiment can result in any one of N different equally likely outcomes, and if exactly n of theseoutcomes correspond to event A, then the probability of event A is


    n A P    )(  


    A mixture of candies contains 6 mints, 4 toffees, and 3 chocolates. If a person makes a random selection

    of one of these candies, find the probability of getting (a) mint, or (b)a toffee or a chocolate.


    Let M, T, and C represent the events that the person selects, respectively, a mint, toffee, or chocolatecandy. The total number of candies is 13, all of which are equally likely to be selected.

  • 8/19/2019 PS - Handbook.pdf



    (a) Since 6 of the 13 candies are mints, the probability of event M , selecting a mint at random, is


    6)(    M  P   

    (b) Since 7 of the 13 candies are toffees or chocolates, it follows that 13

    7 B A P   

    2.2 Operations with Probability

    Often it is easier to calculate the probability of other events. This may well be true if the event in

    question can be represented as the union or intersection of two other events or as the complement of

    some event.

    Just as events can be treated as sets, so can probabilities of an event (in a sense). The formulas used to

    calculate the probability of unions, intersections and complements of events are similar to the ones

    used for sets.

  • 8/19/2019 PS - Handbook.pdf



    2.3 Additive Rules of Probability

    Given that event A and event B are subsets of the sample space S, the following rules Union of events

    A and B (Additive Rule of Probability)

    If A and B are any two events, then  B A P  B P  A P  B A P     

    where  B A P     is the probability that either events A or B occur and    B A P     is the probability that both events A and B occur.

    If A and B are mutually exclusive, then

     B P  A P  B A P     

    Since if events A and B are mutually exclusive (i.e. A and B cannot occur together), P(A  B) =P( ) = 0

    In general we can write,If a set of events A1, A2 , A3,....., An are mutually exclusive, then

      nn   A P  A P  A P  A A A P        21121  

    2.4 Complement of event A

    Since   S  A A    , and the event A and its complement are mutually exclusive,

      1   A P  A P  A A A A P S  P     

     A P  A P      1  

    Example 2.4:What is the probability of getting a total of 7 or 11 when a pair of dice are tossed?


    Let A be the event that 7 occurs and B the event that 11 comes up. Now, a total of 7 occurs for 6 of

    the 36 sample points and a total of 11 occurs for only 2 of the sample points. Since all sample points

    are equally likely, we have   6 A P  and   81 B P  .The events A and B are mutually exclusive, sincea total of 7 and 11cannot both occur on the same toss. Therefore,






    1   B P  A P  B A P   

    This result could also have been obtained by counting the total number of points for the event    B A P    , namely 8, and writing






    n B A P   

  • 8/19/2019 PS - Handbook.pdf


  • 8/19/2019 PS - Handbook.pdf






     A P 

     A D P  A D P   

    2.6 Independent Events

    Although conditional probability allows for an alteration of the probability of an event in the light of

    additional material, it also helps to understand the concept of Independent Events. In the above

    example  A D P    |  differs from    D P  . This suggests that the occurrence of A influenced D. Howeverconsider the situation where we have events  A and  B  and    A P  B A P    | . In other words theoccurrence of B had no impact on the occurrence of A. Here the occurrence of A is independent of the

    occurrence B.

    Two events A and B are independent if and only if

     B P  A B P    |  and    A P  B A P    | . Otherwise, A and B are dependent.

    2.7 Intersection of events A and B (Multiplicative Rules of probability)

    If in an experiment the events A and B can both occur, then

      )|(   A B P  A P  B A P     

    Thus the probability that both A and B occur is equal to the probability that A occurs multiplied

     by the probability that B occurs, given that A occurs. Since the events  B A P    and  B A P     are equivalent, it follows from above rule that we can also write

      )|(   B A P  B P  A B P  B A P     

    If two events A and B are independent then

     B P  A P  B A P     

    If in an experiment, the events k  A A A A   ,,,, 321   can occur, then

    121213121321   ||   k k k    A A A P  A P  A A P  A P  A A P  A P  A A A A P      If the events k  A A A A   ,,,, 321   are independent, then

    k k    A P  A P  A P  A P  A A A A P    321321    

    Example 2.5:A card is drawn from a regular deck of 52 cards. Event A is the event that the card drawn is a Jack.

    Event B is the event that the card drawn is a diamond. Find the probability that the

    a.  card drawn is a diamond and a Jack. b.  card drawn is a Jack given that the card is a diamond.c.  card drawn is a diamond given that the card is a Jack.


    P(A) = 4 / 52 = 1 / 13 since there are 4 Jacks in the deck

    P(B) = 13 / 52 = 1 / 4 since there are 13 diamonds in the deck

    P(A  B) = 1 / 52 since there is only 1 Jack of diamonds in the deck

  • 8/19/2019 PS - Handbook.pdf



    P(A|B) = P(A B) / P(B) = (1 / 52) / (13 / 52) = 1 / 13 = P(A)P(B|A) = P(A B) / P(A) = (1 / 52) / (4 / 52) = 1 / 4 = P(B)

    As event A and B are two independent events we can get P (A B) using the formula also.That is P (A B) = P (A) P (B) = (1/13) (1/4) = 1/52

    Example 2.6:A bag contains 6 blue balls and 4 red balls. Two balls will be drawn from the bag. Calculate the

     probability of either one of the balls is blue.


    Let event A be the event that the first ball is blue, and let event B be the event that the second ball is

     blue. Then, the event A' will be the event that the first ball is red, and event B' will be the event that

    the second ball is red.

    Since there are 6 blue balls out of a total of 10 balls, the probability of choosing a blue ball in the firstdrawing is 6/10. If a blue ball is taken out, then there will only be 5 blue balls and 9 total balls left;

    the probability of choosing a blue ball will be 5/9. On the other hand, if the first ball is a red ball, then

    there will be 6 blue balls and a total of 9 balls, in which case there would be a 6/9 (or 2/3) probability

    of getting a blue ball.


    P(A) = 6/10

    P(A') = 1 - 6/10 = 4/10

    P(B|A) = 5/9

    P(B|A') = 2/3

    P(A  B) = P(A) P(B|A) = (6/10)(5/9) = 1/3P(A'   B) = P(A') P(B|A') = (4/10)(2/3) = 4/15

    Since A and A' are mutually exclusive events, (B  A) and (B   A') are also mutually exclusiveevents. Thus, we can calculate P(B) as follows:

    P(B)= P(B  S) = P( B   (A   A') ) = P( (B  A)   (B   A') )= P(B   A) + P(B   A')= 1/3 + 4/15

    = 9/15

    P (A   B) = P(A) + P(B) - P(A   B)

    = 6/10 + 9/15 - 1/3

    = 13/15

    (A   B) is the event that either one of the two balls drawn is blue. This being the case, P(A   B)is the probability that the first ball is blue, plus the probability that the first ball is red and the second

     ball is blue. Thus,

    P (A   B) = P(A) + P(A'  B)= 6/10 + 4/15

  • 8/19/2019 PS - Handbook.pdf



    = 13/15

    This is a different way to obtain the solution, but the result is the same nevertheless (Events A and B

    are independent events.)

    2.8 Bayes’ Rule 

    If the set of events A1 , A2 , ....., An  constitutes a partition of the sample space S, and event B is a

    subset of S, then

    B = B   S= B  (A1   A2   .........  An )= (B   A1  )   (B  A2 )   .......   (B   An )

    As the events A1  , A2  , ......, An  are mutually exclusive, then the events (B  Ai ), where i {1, 2,...., n }, is also mutually exclusive. Assuming that none of the events A1 , A2 ,....., An  is null, i.e. P(Ai 

    )  0 , i {1, 2,...., n }

    P(B) = P(B   A1  ) + P(B  A2 ) + ....... + P(B  An)= P(A1) P(B | A1  ) + P(A2) P(B | A2) + ....... + P(An ) P(B | An )

    Theorem of total probability

    If the events k  A A A   ,...,, 21   constitute a partition of the sample space S such that,   0i A P    fori=1,2,…,k, then for any event B of S,




    i   A B P  A P  A B P  B P 11


    From the definition of conditional probability,





     B P 

     A B P  A P 

     B P 

     B A P  B A P    iiii  


    Thus we have derived Bayes' Rule, which states the following:

    Bayes' Rule :

    If the set of events A1 , A2 ,....., An constitutes a partition of the sample space S, P(Ai )   0 , i {1,

    2,....., n }, and event B is a subset of S, P(B)  0,



    2211   nn


    i A B P  A P  A B P  A P  A B P  A P 

     A B P  A P  B A P 


    Example 2.7:A family had plans to go fishing on a Sunday afternoon, but their plans were dependent on the weather

    at noon Sunday. If it was sunny, then there was a 90 % chance that they would go fishing. If it was

    cloudy, then the probability that they would go fishing would drop to 50 %. And if it was raining, the

    chances dropped to 15 %. The weather prediction, which we can assume to be accurate, called for a10 % chance of rain, a 25 % chance of clouds, and a 65 % chance of sunshine.

  • 8/19/2019 PS - Handbook.pdf



    Set event F as the event that the family goes fishing

    S as the event that the weather is sunny at Sunday noon

    C as the event that the weather is cloudy at Sunday noon

    R as the event that the weather is rainy at Sunday noon

    Assuming that the family ends up going fishing, find the probability of each type of weather occurring.


    P(S) = 0.65, P(C) = 0.25, P(R) = 0.10

     Note that P(S) + P(C) + P(R) = 1, and of course S, C and R are mutually exclusive events.

    P(F|S) = 0.90, P(F|C) = 0.50, P(F|R) = 0.15

    P(F) = P(F|S) P(S) + P(F|C) P(C) + P(F|R) P(R)

    = (0.90)(0.65) + (0.50)(0.25) + (0.15)(0.10)

    = 0.585 + 0.125 + 0.015

    = 0.725

    Assuming that the family ends up going fishing, the probability of each type of weather occurring isP(S|F) = probability of sunny weather, given that the family went fishing.



     F  P 

    S  P S  F  P  =


    )65.0)(90.0( = 0.807

    P(S|F) = probability of cloudy weather, given that the family went fishing.



     F  P 

    C  P C  F  P  =


    )25.0)(50.0( = 0.172

    P(S|F) = probability of rainy weather, given that the family went fishing.



     F  P 

     R P  R F  P  =


    )10.0)(15.0( = 0.021

     Note that P(S|F) + P(C|F) + P(R|F) = 0.807 + 0.172 + 0.021 = 1.000

    2.9 Exercises (Extracted from Schaum’s Series by Walpole & Mayer)

    1.  A pair of dice is tossed and the two numbers appearing on the top are recorded. Draw the samplespace and find the number of elements in each of the following events:

    (a)  A = { two numbers are equal }(b)  B = { sum is 10 or more }(c)  C = { 5 appears on first die }(d)  D = { 5 appears on at least one die }

    2.  Determine the probability p of each event:(a) An even number appears in the toss of a fair die.

  • 8/19/2019 PS - Handbook.pdf



    (b) At least one tail appears in the toss of 3 fair die.(c) A white marble appears in the random drawing of 1 marble from a box containing 4 white

    marbles, 3 red marbles and 5 blue marbles.

    3.  A box contains 15 billiard balls, which are numbered from 1 to 15. A ball is drawn at random andthe number recorded. Find the probability P  that the number is;

    (a) Even(b) Less than 5(c) Even and less than 5(d) Even or less than 5

    4.  A class contains 10 men and 20 women of which half the women and half the men have browneyes. Find the probability P  that a person chosen at random is a man or has brown eyes.

  • 8/19/2019 PS - Handbook.pdf



    5.  A sample space S consists of 4 elements, that is, S = 4321   ,,,   aaaa . Under which of the followingfunctions P  does become a probability space?

    (a)    3.0,2.0,3.0,4.0 4321     a P a P a P a P   

    (b)    1.0,7.0,2.0,4.0 4321     a P a P a P a P   

    (c)    3.0,1.0,2.0,4.0 4321     a P a P a P a P   (d) 

      1.0,5.0,0,4.0 4321     a P a P a P a P   

    6.  Suppose A and B are events with   ,6.0 A P      ,3.0 B P   and   .2.0 B A P   Find the probabilitythat:

    (a) A does not occur.(b) B does not occur.(c) A or B occurs.(d)  Neither A nor B occurs.

    7.  Three fair coins, a penny, a nickel, and a dime, are tossed. Find the probability p that they are allheads if:

    (a) The penny is heads(b) At least one of the coins is heads,(c) The dime is tails

    8.  A billiard ball is drawn at random from a box containing 15 billiard balls numbered 1 to 15, andthe number n is recorded.

    (a) Find the probability p that n exceeds 10.(b) If n is even, find the probability p that n exceeds 10.

    9.  In a certain college, 25 percent of the students failed mathematics, 15 percent failed chemistry, and10 percent failed both mathematics and chemistry. A student is selected random.

    (a)  If the student failed chemistry, what is the probability that he or she failed mathematics?(b) If the student failed mathematics, what is the probability that he or she failed chemistry?(c) What is the probability that the student failed mathematics or chemistry?(d) What is the probability that the student failed neither mathematics nor chemistry?

    10. Find  A B P    |  if :(a) A is a subset of B.(b) A and B are mutually exclusive (disjoint) Assume     0 A P  .

    11. If the probabilities are, respectively, 0.09,0.15,0.21, and 0.23 that a person purchasing a newautomobile will choose the color green, white, red, or blue, what is the probability that a given

     buyer will purchase a new automobile that comes in one of those colors?

    12. Suppose that a factory has a fuse box containing 20 fuses, of which 5 are defective. If 2 fuses areselected at random and removed from the box in succession without replacing the first, what is the

     probability that both fuses are defective?

    13. Items sampled on a production line may be classified as defective (D) or non-defective (N). Listelements in the sample space if sampling process terminates:

    (a) After 4 items have been sampled.

  • 8/19/2019 PS - Handbook.pdf



    (b) After 3 defectives in a row have been observed or 4 items have been sampled.(c) When the first defective is observed.

    Suppose, 5% of the products are defective.

    (d) Find the probability of exactly 2 defective items if sampling processes (a) is adopted.(e)  In the sampling process (c), what is the probability that the sampling process is terminated

     before the 3rd item is sampled?

    14.  It is compulsory for the driver of a car to wear a seat belt while driving. The results of a surveyshow that not all drivers are wearing seat belts.

    Age Driver wearing

    seat belt

    Driver not

    wearing seat belt

    < 40 375 52

    >= 40 425 148

    Use the data to estimate the probability that a randomly chosen driver(a)  Is wearing a seat belt.(b) Is under 40 and wearing a seat belt.(c) Suppose the randomly chosen driver is under 40. What is the probability that the driver is

    wearing a seat belt?

    15. In a certain region of the country it is known from past experience that the probability of selectingan adult over 40 years of age with cancer is 0.05. If the probability of a doctor correctly diagnosing

    a person with cancer as having the disease is 0.78 and the probability of incorrectly diagnosing a

     person without cancer as having the disease is 0.06, what is the probability that a person isdiagnosed as having cancer?

    16. In a certain assembly plant, three machines, B1, B2, and B3, make 30%,45%,and 25%,respectively,of the products. It is known from past experience that 2%, 3%, and 2% of the products made by

    each machine, respectively, are defective. Now, suppose that a finished product is randomly

    selected. What is the probability that it is defective?

  • 8/19/2019 PS - Handbook.pdf



    17. Suppose that three machines at a factory are used to produce a large quantity of identical parts. The production machines have different capacities: Machine A has a large capacity and produces 60%

    of the parts, while machines B and C produce 30% and 10% of the parts, respectively.

    Historical data indicate that 10% of the parts produce by Machine A are defective, compared to

    30 % for Machine B and 40% for Machine C.

    (a) Complete the following table.

    (b) What are the conditional probabilities, updated in light of the evidence that the part is defective

    of machine A, B or C having produced it?

    2.9 Further Exercises (Probability and statistics: Walpole, Myers and Myers –  8th edition) 

    Exercises - Page 97 - Question 2. 109, 2. 110, 2. 111, 2. 112, 2. 129



    3.1 Introduction

    Machine Defective Nondefective Total




    Total 100

  • 8/19/2019 PS - Handbook.pdf



    Definition: A random variable X is a numerically valued variable defined on the sample space, . R:X  

    We say that X is a discrete random variable if it can take only a countable set of values, i.e. integer

    or rational values.

    Consider tossing a fair coin. We know that the outcome is either a head or a tail.

    P(head) =2

    1 , P (tail) =


    If we denote the number of heads by X, then

    P(X = 1) =2

    1 , P (X = 0) =


    X is an example of a random variable. Note that a random variable is usually labelled with a capital

    letter (say X). The realised value of the random variable X is denoted by x.

    Definition: If we have a discrete random variable X taking values n21   x,.....,x,x  

    with probabilities n21   p,......., p, p  respectively, where

    i,0 p1 p..... p p p in321   ,

    then this defines a discrete probability distribution for X. Although we have written the random

    variable X as taking a finite set of values in this definition, it also holds for an X which takes an infinite

    countable set of values, e.g. all non-negative integers.

    We may write P( X = xi) as pi. This is sometimes referred to as the probability function for X.

    Example 3.1: Two fair dice are thrown. Let X be the sum of the values on the faces turned uppermost.Find the probability distribution for X.

    The sample space can be shown as follows.

    X 2 3 4 5 6 7 8 9 10 11 12


    36 2

    36 3

    36 4

    36 5

    36 6

    36 5

    36 4

    36 3

    36 2

    36 1


     Note that X = 2 if and only if both dice show 1. Also, X = 3 if and only if one die shows 1 and the

    other 2.

     Note that the sum of the probabilities is one and all are positive so this is a valid probability distribution.

    Example 3.2: The discrete random variable X  has probability function given by

    P(X=x) = cx2  , x=1,2,3,4. Find C.

    X 1 2 3 4

    P(X) c 4c 9c 16c

    We know that c + 4c + 9c + 16c = 1 and hence c =301  

  • 8/19/2019 PS - Handbook.pdf



    Definition: Suppose X is a discrete random variable taking values n21   x,.....,x,x , with probabilities

    n321   p,....., p, p, p then the mean or expected value of X, written as   or E[X] is given by


    i   1

    ii x pE(X)μ  

    If X takes an infinite number of values the sum is taken over all values of i.

    To justify this definition, suppose we had a sample of x values where x occurs with frequency f . Thenthe sample mean would be


    xf x









    In the limit if we collect enough data   ii   f f  tends to p. 

    Example 3.3: A die is thrown, what is the mean (or expected) score?












    11)(    X  E  = 3.5

     Note that the expected value of a random variable is not necessarily a value the random variable can

    take. The expected score when we throw a. fair die is 2 1/2, but a die cannot take this value. Think of

    the expected value or mean of a random variable as a measure of where the distribution is centred


    The expectation of any function of a random variable, g(X) say, is defined in a similar way.

    Definition: If X is a discrete random variable then the expectation of X is given by



    ii   )g(x pE[g(X)]  

    We can also define the variance and standard deviation of a random variable.

    Definition: If X is a discrete random variable then its variance, written Var[X] is defined by




    ii   μ)(x pVar[X]  

    The standard deviation of X is the positive square root of the variance of X.By multiplying out the bracket it is straightforward to see that the variance is given by

        22ii   μ)x p(Var[X]   or 22 (E[X])]E[XVar[X]    

    Example 3.4 : A die is thrown, what is the variance of the score?














    11]E[X   2222222  







    The variance gives an idea of how spread out the distribution is.

  • 8/19/2019 PS - Handbook.pdf



    3.2 Bernoulli Distribution

    X 0 1

    P(x) 1-p P

    Example 3.5 : Toss a coin once. Let p be the probability of getting a head and

    X = 0 if T occurs

    = 1 if H occurs

    Then X ~ Bernoulli (p).

    3.3 Binomial Distribution

    If we have n Bernoulli trials with probability of a success equal to p then the probability of r successes

    is given by the binomial probability

    n.,2,1,0,r  p)(1 pc)(   r nr r n

      r  X  P   

    Thus, if we consider the random variable X which is the number of successes in n Bernoulli trials, then

    P(X = r) is given by the binomial probability with parameters n and p.

    Statistical table 1 gives the probability of r or more successes in n independent trials with the

     probability of success p. For example if we wanted the probability of obtaining 23 or more heads in50 tosses of a fair coin we find that the answer is 0.76006.

    Example 3.6: Suppose that 5% of the articles made by a factory are defective. What is the probability

    of finding 1 defective in a sample of 10 from a very large batch? Since it is a large batch we may treat

    this as sampling without replacement and the number of defectives, X, will have a binomial distribution

    with n =10 and

     p = 0.05. Thus


    10)1(   9



      X  P   

    We can also find this quantity from the tables, 31512.008614.040126.0)2()1()1(     X  P  X  P  X  P   

    The tables are only given for some values of n and p so are not always useful, but you should knowhow to use them. Note that although p is only given up to 0.5, we can always turn a problem where the

     probability of a ‘success’ is greater than 0.5 into a question about ‘failures’ which will have probability

    less than 0.5. An example of this is given next.

    Example 3.7: Fifty seeds were planted and it is known that the probability of any seed germinating is

    0.8. Assuming that the number of seeds germinating follows a binomial distribution, using tables find

    the probabilities of the following events (a) exactly 40 seeds germinate,

    (b) more than 12 seeds fail to germinate,(c) more than 38 but fewer than 45 seeds germinate.

  • 8/19/2019 PS - Handbook.pdf


  • 8/19/2019 PS - Handbook.pdf




     p pnp

     p pk 


     p pr 

    nr np

     p pr 


     p pr nr 


     p pr nr 


     p pr 


     p pr 


    r  X rP  X  E 



    k nr 


    r nr 


    r nr 


    r nr 


    r nr 


    r nr 


    r nr 











































    Recall that 22 ]][[][][   X  E  X  E  X Var     

     Now ][)]1([][][  22

     X  E  X  X  E  X  E  X  E     



























     p p pnn

     p pk 

    n pnn

     p pr 

    n pnn

     p pr 


     p pr nr 


     p pr nr 

    nr r 

    r  X  P r r  X  X  E 



    r nk 


    r nr 


    r nr 


    r nr 


    r nr 












  • 8/19/2019 PS - Handbook.pdf



    Thus )1()(.)1(][   22  pnpnpnp pnn X Var     

    Example 3.8: The random variable X has a binomial distribution with parameters n=100 and p=0.8.

    Find the mean and the variance of X.

    The mean  = np = 80, the variance is np (1- p) =16

    3.4 Poisson Distribution

    Suppose events occur at random at an average rate   per minute. Examples include radioactive decayand arrivals in a queue. Then the distribution of the number of events which occur in one minute, is

    said to have a Poisson distribution with parameter  . If X has a Poisson distribution then


    )exp(][     r r 

    r  X  P r  


    where  >0. Note that





    0 0




    r r 

    r r 

    r r 


    So this is a valid probability distribution.

    It can be shown that        )(,)(   X V  X  E   

    Statistical Table (3) gives the probability that a Poisson random variable with mean   will be greateror equal to r in the same way as the binomial tables. For example, suppose that X is a random variable

    with Poisson distribution with mean 2.0.Find )2()3()3()2()2()1(     X  P  X  P  X  P   

    27067.032332.059399.0)3()2()2(     X  P  X  P  X  P   



     X  P  X  P 

     X  P  

    A property of the Poisson distribution is that if X  is Poisson with mean   then kX  is Poisson with mean

    k  . This can be useful in calculating probabilities of numbers of event in a time period different tothat for which information is given.

     Note that if X has a binomial distribution with parameters n and p that np X  E    )( and

    )1(][   pnp X Var    . Now if p is small then 1-p is close to one and np(1-p)   np .This suggest that if pis small we may be able to approximate X by a Poisson random variable with mean np. So long as p

    is small (may be < 0.1) and n is large (may be >50) a binomially distributed random variable is well

    approximated by a Poisson random variable of mean np.

    Example 3.9: IF X  has a binomial distribution, n=100, p=0.01 then from the tables




     X  P 

     X  P 

     X  P 


  • 8/19/2019 PS - Handbook.pdf



    The corresponding quantities from the Poisson tables with  =1 are




    Y  P 

    Y  P 

    Y  P 


    Example 3.10: The probability that a car has defective gearbox is 0.02. If I check the gearboxes of

    140 cars what is a suitable approximation to the probability that I find

    (a) 2 defectives (b) more than 5 defectives (c) fewer than 4 defectives

    Let X  be the number of defective gearboxes that I find. Then X  has a binomial distribution with n=140

    and p=0.02. Since n is large and p is small a Poisson random variable with mean 8.2 np   will

    give a good approximation to X  tables




     X  P  X  P c

     X  P  X  P b

     X  P  X  P  X  P a


    3.5 Exercises

    1.  A manufacturing process produces components which are free from any faults with probability p. Find the probability that in a sample of size 50 from a large batch there are fewer than 4

    faulty components when  p = 0.95. Find the probability that in a sample of size 50 there are

    fewer than 10 faulty when  p = 0.75.

    2.  Use the table to give a suitable approximation to the probability that 5 X   where X  is binomialrandom variable with parameters p = 0.05 and n = 400.

    3.  A car-pooling study shows that the number of passengers, X in a car (excluding the driver) islikely to assume the values 01,2,3 and 4 with probabilities given by the table.

    X 0 1 2 3 4

    P(X=x) 0.7 0.1 0.1 0.05 0.05

    (a) Determine the probability of at least two passengers in a car.(b) Find the cumulative distribution function of X and sketch it.(c) Calculate

    (i)  E(X)(ii)  E(X2)(iii)  V(X)

    4.  Suppose that in late summer, the Fremantle Surf Life Saving club makes an average of two surfrescues per day Use the Poisson probability distribution to determine the probability that

    (a) More than two rescues are made on a particular day.

  • 8/19/2019 PS - Handbook.pdf



    (b) Five surf rescues are made in a 3-day period.

    3.6 Further Exercises (Probability and statistics: Walpole, Myers and Myers –  8th Edition) 

    1.  Exercises - Page 189 - Question 5.51 –  5.70

  • 8/19/2019 PS - Handbook.pdf





    4.1 Introduction

    A random variable X is a numerically valued variable defined on the sample space,  X:   R

    We say that X is a continuous variable if it is not discrete.

    Definition: If X is a continuous random variable then there exists a non-negative function, f(x), called

    the  probability density function of X such that




    dx x f  b X a P 

    dx x f  




     Note that any function, which is non-negative and integrates to one is a possible probability density

    function for a random variable X. As with discrete random variables some density functions are

    commonly used to model continuous random variables. It is also convenient to define the following


    Definition The cumulative distribution function, F(x) of a continuous random variable x is defined by

    dx x f  t  X  P t  F    )()()(  

     Note that for a discrete random variable the cumulative distribution function

    P(X   x) will be a step function with steps of height P(X = x) at the points at which X is defined. Thecontinuous version can be thought of as a limiting case when all values of x in an interval are possible.

     Note that the cumulative distribution function is always non-decreasing and


     x F    0)(lim  


     x F    1)(lim  

    We define the mean of a continuous random variable as follows.

    Definition If X is a continuous random variable with probability density function f(x) then the mean

    or expected value of X, E[X] or p is defined by

      dx x xf   X  E    )(][      

    We define the expectation of a function of X in a similar way

    Definition  If X is a continuous random variable with probability density function f(x) then the

    expected value of g(x) is defined by

  • 8/19/2019 PS - Handbook.pdf



      dx x f   x g  X  g  E    )()()]([  

    Similarly the variance is defined by

    Definition If X is a continuous random variable with probability density function f(x) then the variance

    of X, Var [X] is defined by







    dx x f   x

    dx x f   x xVar 


    We can also define the median of a continuous random variable.

    Definition if X is a continuous random variable with probability density function f(x) then the median

    of x is the value m satisfying the equation



    dx x f  dx x f  2


    It is the value such that X is equally likely to be more than the median as less than it.

    Example 4.1: A random variable X has probability density function.




    2  xif   xcx x f    

    1. Determine c.

    2. Find E[X].

    3. Find Var[X].

    4. Show that the median m satisfies the equation

    0186   34   mm  Solution:

    1. We know that

    1)(   dx x f    


  • 8/19/2019 PS - Handbook.pdf















    dx x xc

     x x  

    and hence c = 12.













     x x

    dx x x X  E 
















     x x

    dx x x X  E 











      X Var   


  • 8/19/2019 PS - Handbook.pdf



















     x x

    dx x x



    4.2 Exponential Distribution

    The exponential distribution can be used to model the lifetimes of components. It is also linked to the

    Poisson distribution. If X has a Poisson distribution then the time between occurrences of X follows

    an exponential distribution.

    The probability density function for an exponential distribution is




     xif   x x f  


    We shall check first that this is a valid p d f. Clearly   0)(    x f   . Also



      xdx x      


    To find the mean we use integration by parts














    dx x x x

    dx x x X  E 


  • 8/19/2019 PS - Handbook.pdf



    To find the variance we first find E[X2]. This is also done by integration by parts



















     X  E 

    dx x x x x

    dx x x X  E 





        X Var   

    The cumulative distribution function is given by




     xif  dt t  f  

     xif   x X  P  x F    x  






    0 0


    dt t dt t  f  





    The median m is given by F(m) = ½. Therefore substituting into the cdf


















    4.3 Exercises

    1. The random variable X has probability density function   3)2()(   xc x f     for 0

  • 8/19/2019 PS - Handbook.pdf



    2.  Assume that the continuous random variable x has the probability density function




    2  x for  xk  x f    

    (a) Calculate the value of k .

    (b) Find the mean and variance of x.(c) Find the cumulative distribution function of x.

    (d) Find the median of x.

    (e) Find P(1/2x< 1).

    3.  The time (in hours) between successive calls has an exponential distribution with parameter 1/6 . What is the probability of waiting more than 15 minutes between any two successivecalls?

    4.  Identify and name the continuous random variables from the following list of variables: X : the number of automobile accidents per year in Virginia.

    Y : the length of time to play 18 holes of golf.

     M : the amount of milk produced yearly by a particular cow.

     N : the number of eggs laid each month by a hen.

     P : the number of building permits issued each in a certain city.

    Q: the weight of grain produced per acre.

    4.4 Further Exercises (Probability and statistics: Walpole, Myers and Myers –  8th Edition) 

    1. Exercises - Page 112- Question 3.7, 3.9, 3.12, 3.21

  • 8/19/2019 PS - Handbook.pdf





    5.1 Introduction 

    The normal, or Gaussian, distribution is the most commonly used distribution in statistics. A normally

    distributed random variable with mean    and variance 2   has its probability function given by

      xfor ]/2σμ)(xexp[2πσ

    1φ(x)   22  

    It is denoted by )σ N(μ(~X   2 .

    If X is normally distributed with mean 0 and variance 1, then we write   N(0,1)~X . Its probability

    density function is usually written as  (x) and is given by

      xfor )2xexp(2π

    1φ(x)   2  

    The cumulative distribution function is denoted by (x).

    We can calculate probabilities for a normal distribution from the standard normal using



    Statistical Table (4) gives the probability that a standard normal random variable, i.e. with mean zero

    and variance 1, is larger than specified value. i.e. 1-(x).. In using the tables we utilise the symmetryof the normal distribution, and the fact that 0.50)P(Z0)P(Z    

    Example 5.1: Calculate the probabilities of the following events.

    (i)  Z < -2.45,(ii)  (Z < - 2.1) ( Z > 2.1)(iii)  0 < Z < 1.2


    (i) By symmetry P ( Z < -2.45) = P (Z > 2.45) = 0.00714

    (ii) By symmetry P [( Z < -2.1)  P ( Z > 2.1) ] = 2 P ( Z > 2.1) = 2 x 0.01786 = 0.03572(iii)

    P [ Z > 1.2] = 0.11507P [Z < 1.2 ] = 1 –  0.11507

    = 0.88493

    P[0 < Z < 1.2] = 0.88493 –  0.5

    = 0.38493

    Example 5.2: It is known that in a certain district the heights of adult males are normally distributed

    with mean 175cm and standard deviation 7cm. Find the probability that a man selected at random from

    this district will be

    (a) over 182cm tall.

    (b) between 170cm and 181cm tall.

    (c) under 179cm tall.

    Let X be the height of the selected man.

  • 8/19/2019 PS - Handbook.pdf



    Then ) N(175,7~X   2  Z = (X-175)/7 ~ N (0,1)

    (a) P( X > 182) = P ( Z > (182 –  175)/7) = P (Z > 1) = 0.159

    (b) P( 170 < X < 181) = P ( -5/7 < Z < 6/7)

    = P (Z >-5/7) –  P (Z > 6/7)

    = 0.7625 –  0.1968 = 0.566

    (c) P (X< 179) = P ( Z < 4/7) = 1 –  P( Z > 4/7)  1 –  0.284 = 0.716 

    5.2 Normal as an Approximating Distribution

    When n  is large and p moderate we may use the normal distribution to approximate binomial

     probabilities. Note that as we are approximating a discrete random variable by a continuous one, wehave to employ continuity correction.

    For discrete random variable P( X < x) = P ( X  x-1) We approximate these quantities by P (Y < x -

    21 ) . We illustrate the technique in the following example.

    Example 5.3: A fair coin is tossed 150 times. Find a suitable approximation to the

     probability of each of the following events.

    (a) more than 70 heads

    (b) fewer than 82 heads(c) more than 72 but fewer than 79 heads.

    Let X be the number of heads thrown, then X has a binomial distribution with n = 150 and p = ½ . As

    n is larger and p moderate we may approximate X by Y a normal random variable with mean np = 75

    and variance np(1-p) = 37.5.

    a.  We require P(X > 71) but this is the same as P(X  70 ) so we approximate by P (Y > 70.5).


     b.  We require P( X < 82) but this is the same as P(X 81) so we approximate

     by P(Y < 81.5). P(Z < (81.5 –  75)/   5.37 )   P(Z < 1.06) = 1- 0.145 = 0.855

  • 8/19/2019 PS - Handbook.pdf



    (c) We require P (72 < X < 79) which is the same as P (73  X  78) and thus we approximate by(72.5 < y < 78.5).

    P(-0.408 < Z < 0.571) = 0.658 –  0.284 = 0.374 

    We may similarly approximate a Poisson random variable by a normal one of the same mean and

    variance so long as this mean is moderately large. We again have to use the continuity correction. 

    Example 5.4: A radioactive source emits particles at random at an average rate of 36 per hour. Find

    an approximation to the probability that more than 40 particles are emitted in one hour.

    Let X be the number of particles emitted in one hour. Then X has a Poisson distribution with mean 36

    and variance 36. We can approximate X by Y which has a N(36, 36) distribution. We require P(X  >

    40). This is approximately P(Y  40.5).






    5.3 Exercises

    1.  The sample data consists of the values:

    0.325 0.317 0.375 0.325 0.508 0.117 0.150 0.317 0.275 0.383

    Do they appear to come from a Normal Distribution?

    (i)  What is the percentage of values within one standard deviation of the mean?(ii)  What is the percentage of values within two standard deviations of the mean?Do they appear to come from a Normal Distribution? Justify your answer.

    2.  Construct a Normal probability plot using SPSS for the data given in (1) and explain how itcould be used for checking normality.

    3. 94 95 30 98 76 73 95 97 86 91 85 70 96

      70 91 72 97 97 84 28 19 90 77 58 58 47

      48 28 20 65

    (a) Plot these data.(b) Find the mean and the standard deviation for this data.(c) Let X be a Gaussian (Normal) random variable with mean and standard deviation you

    calculated in part (b). Find the following probabilities.

  • 8/19/2019 PS - Handbook.pdf



    (i)  P(X < 30)(ii)  P(X > 90)(iii)  P(50 < X < 80)

    (d) Find the proportion of data values that are

    (i)  less than 30(j)  greater than 90(k) from 50 to 80

    (e) Can the distribution of these values approximated by a Normal distribution?

    4. The lengths of a batch of bolts are assumed normally distributed with mean 4cm and standard

    deviation 0.1cm. What is the probability that a bolt selected at random will be more than

    4.1655cm in length? (Give answer to 5 dp)

    5. A coin is to be tossed 100 times.

    (a) Assuming the coin is biased with P(head) =0.6, use a normal approximation to estimate

    the probability that between 56 and 63 heads occur.

    (b) Assume P(head)=0.99. Use a suitable approximation to estimate the probability that

    exactly 99 heads occur. (Do not calculate the exact binomial probability).

    5.4 Further Exercises (Probability and statistics: Walpole, Myers and Myers –  8th Edition) 

    1. Exercises - Page 209 - Question 6.1 –  6.10

  • 8/19/2019 PS - Handbook.pdf





    6.1 Introduction

    Definition: The  sampling distribution of a random variable is the collection or distribution of all

     possible values of the random variable over all possible samples. If the sample is a random sample of

    size n from an infinite population then, x1,x2,…xn  are independent random variables each with the

    same distribution (i.e. same p.d.f or probability function) as the population so that

    E(xi) =   Var (xi) = 2 

    Theorem 1 Averaging over all random samples of size n from an arbitrary population with mean  and variance

    2 , the sample mean   x and sample variance 2

     s have the following three properties:

    E (   x) =  i.e is an unbiased estimator of    

    Var (   x) = 2/ni.e. the variability of as an estimator decreases with n.

    22 σ]E[s     i.e. s2 is an unbiased estimate of    2  .

    Thus s2/n is used as an unbiased estimate of the variability or variance of    x as an estimator of μ .

    Example 6.1: An infinite population is described by an asymmetrical discrete distribution with just

    two values: -3 with probability 0.3 and +1 with probability 0.7. Thus we have


    36.3)2.0(7.0)(10.33)(μ]E[XVar[X]σ   222222  

    These are the values of the (usually unknown) population parameters. Let us now look at all samples

    of size of 3. There are infinitely many, but we can tabulate them as follows:

    Sample observations  x   s2  P(sample)

    -3, -3, -3 -3 0 (0.3)3 = 0.027-3, -3,1 -5/3 32/6 3(0.3)2  x 0.7 = 0.189

    -3,1,1 -1/3 32/6 3(0.7)2  x 0.3 = 0.441

    1,1,1 1 0 (0.7)3 = 0.343

    Thus we see that 2 as an estimate of p is 2.8 below the ’true ’value in 2.7% of

    samples, 1.2 above in 34.3% of samples etc. Taking the average over all samples or equivalently the

    expectation over the sampling distribution, we see that

    )xE(  = - 3x0.027+3

    5x 0.189 +


    1  x 0.441 + 1 x 0.343 = - 0.2

    exactly, confirming the first result of Theorem 1.


  • 8/19/2019 PS - Handbook.pdf




    = 2222

    2 )2.0(343.0)1(441.03










    =  2

    12.1      which confirms the second result. The third result is verified for this example by averaging over all possible values of s2  thus:

    22 σ3.360.34300.4416




    Proof of Theorem 1





















    ]x[xVar n










     Note: x1, …,xn are independent (random sample)

    The theorem shows thatn

       is the standard deviation of the sampling distribution of    x. A sample

    estimate of this variability isn

     s  and is called the (estimated) standard error of the (sample) mean.

    Theorem 2 Central Limit Theorem says that as n  the sampling distribution of    x tends to a Normal distribution with the same mean and variance.

    The importance of this result is that we do not need to know the form or type of the original populationdistribution if our sample size is sufficiently large. We can use instead the Normal distribution for

  • 8/19/2019 PS - Handbook.pdf



    statistical inference with the knowledge that the probabilities we calculate will be good approximations

    to the true (but generally unknown) probabilities.




    n N  X 


    ,~      approximately for large n.

    Then using the properties of the Normal distribution we can say that for large n,






     X  P 



    can be found approximately for any specified value u without knowing the original form of the


    If, however, we do know the form of the population and it follows a Normal distribution, then for any 

    sample size n > 1 it can be shown that





    σμ, N~X



    Thus 1); N(0~nσ



    has a sampling distribution which is Standard Normal for any n (Table 4).

    As the population standard deviation    is often unknown, replacing it by the corresponding samplequantity s changes the sampling distribution.

    However, provided the underlying population is Normal, it can be shown that




    has a ‘Student’s t-distribution’ with v   ‘degrees of freedom’, where 1 nv   (named after W. S.Gossett, who took the pseudonym ‘Student’). The per centiles of this distribution are given in Table 7.

    Another distribution, which arises from random samples of Normal popu lations, is the ‘chi-square’

    distribution, whose percentage points are given in Table 8. It can be shown that




    χ ~σ



    the chi-square distribution with 1n   degrees of freedom, whatever the value of  X  . Yet anotherdistribution is the (Fisher) F-distribution with percentage points in Table 9. The F and   2    distributions

    are used for statistical inference on the variances of Normal populations as well as for wider application

    in Goodness-of-Fit tests.


    Using SPSS it is possible to check these distributional results empirically by generating a sufficient

  • 8/19/2019 PS - Handbook.pdf



    number of random samples from a Normal population.

    6.2 Exercises

    1.  The heights of 1000 students are approximately normally distributed with a mean of 174.5 cm anda standard deviation of 6.9 cm. If 200 random samples of size 25 are drawn from this population

    and the means recorded, determine

    (a) The expected mean and standard deviation of the sampling distribution of the mean.(b) The number of sample means that fall between 172.5 and 175.8 cm inclusive.(c) The number of sample means that falling below 172 cm.

    2.  Show that the sample variance is unchanged if a constant is added to or subtracted from each value

    of the sample.

    3.  If the size of a sample is 36 and the standard error of the mean is 2, what must the size of thesample become if the standard error is to be reduced to 1.2?

    4.  The amount of time that a drive-through bank teller depends on a customer is a random variablewith a mean 3.2 minutes and a standard deviation  1.6 minutes. If a random sample of 64customers is observed, find the probability that their mean time at the teller’s counter is

    a)  At most 2.7 minutes; b)  More than 3.5 minutes;

    c)  At least 3.2 minutes but less than 3.4 minutes.

    5.  If all possible samples of size 16 are drawn from a normal population with mean equal to 50 and

    standard deviation equal to 5, what is the probability that a sample mean   will fall in the intervalfrom   1.9 ,   0.4? Assume that the sample means can be measured to any degree ofaccuracy.

    6.3 Further Exercises (Probability and statistics: Walpole, Myers and Myers –  8th Edition) 

    1. Exercises - Page 275 - Question 8.18 –  8.20

  • 8/19/2019 PS - Handbook.pdf



  • 8/19/2019 PS - Handbook.pdf





    7.1 Introduction to Hypothesis Testing

    Definition 1 The null hypothesis H 0 is a statement about the value of the parameter of interest. A simple

    null hypothesis specifies the population distribution exactly. We examine the data to see whether they

    support ‘or provide evidence against the null hypothesis H0.

    The alternative hypothesis H1 describes only the possibilities (there may be many) that we are prepared

    to consider if H0 is not true.

    Definition 2 The test statistic for H0 versus H1 is a random variable with known (or approximately

    known) distribution-assuming H0 to be true ‘under H0’. The observed value of the test statistic can

    indicate departures from H0 in favour of H1.

    Definition 3 The P-value gives the probability of, under H 0 , observing a value of the test statistic at

    least as extreme as the value actually observed, where extremities indicate departures from H0 in favour

    of H1. If the P-value is as small or smaller than , we say the test is statistically significant.

    7.2 Procedure

     Null and Alternative Hypotheses A clear statement of both should be given in terms of the population

     parameter of interest, together with a short verbal interpretation.

    Test Statistics: The formula in terms of sample statistics such as mean and standard deviation should

     be stated with the (sampling) distribution under the null hypothesis. Then the observed value of the

    test statistic should be calculated to at least three significant figures.

    Assess evidence: The P-value should be used to form a verbal statement or conclusion regarding the

    truth or otherwise of the null hypothesis. Finally a verbal interpretation of this conclusion should be

    given for the non-statistician.

    Depending on the conclusion reached (if any) the investigator may wish to quote a confidence interval

    for the parameter at the desired level.

    Example 7.1: Articles produced by a manufacturer should have mean length 4 cm. and standard

    deviation 0.02cm. A test sample of size 10 from a large batch of production has x  = 4.01. Is there

    evidence that the unknown mean length μ , say, of articles in the batch is unsatisfactory?

  • 8/19/2019 PS - Handbook.pdf



    The Null hypothesis is H0 :   = 4 (batch satisfactory) to be tested against the alternative

    H1 : μ  4 (batch unsatisfactory).

    We need a test statistic whose distribution is known under the null hypothesis i.e. assuming H 0 to be

    true. We know that in general for random samples from a Normal population


    σ., N(μ~X



    so, under H 0 


    (0.02), N(4~X






    is standard Normal . Large values of Z (either positive or negative) indicate departures from H 0 in

    favour of H1 and the observed value of Z is




    So, the probability of observing a value of Z at least as extreme as this (the P-value) is

     P (Z> 1.58) + P( Z < -1.58) = 2 x P (Z> 1.58) = 0.1141,

    using the symmetry of the Normal distribution.

    Thus there is a 11.4% chance of observing this sample result or worse even if the batch is satisfactory.

    We therefore conclude that there is no evidence against the null hypothesis.

     Note that this was a ‘two-sided’ or ‘two-tailed’ test as the alternative hypothesis is ‘two sided’, namely

    4  . If there was a legal requirement of a maximum mean length of cm, then we would not be

    concerned with the possibility that     4. We would ask

    whether there was sufficient evidence in the data to make us worry about failing the requirement, and

    the test statistic and observed value would be the same as before. Only large positive and not negative

    values of Z would indicate departures in favour of H1 so the P-value is just P(Z >1.58) = 0.057. Now

    we have slight evidence against H0 in favour of.H1 i.e. slight evidence that the batch may fail to meet

    the legal requirement. This is called a ‘one-tailed’ or ‘one-sided’ test as the alternative hypothesis is

    “one-sided’, namely  > 4.

    However, the assumption that the population variance 2 is known is often unrealistic:

    Example 7.2: A random sample of 5 men had a mean height x  of 70 inches and a sample standard

    deviation s of 2 inches. Is there any evidence in these data against the (null) hypothesis that the mean

     of the population is 67 inches? To test H0 :   = 67 versus H1 :     67 we need a test statistic whosedistribution is known under H0. Such a statistic is



      ~ t0 under Ho

  • 8/19/2019 PS - Handbook.pdf



    That is, Student’s ‘t’ with 4 degrees of freedom. The observed value of T  is 3.35 so to calculate P we

    must refer this value to percentage points of the t-distribution with 4 degrees of freedom (d.o.f.) Now

    t4(0.025) = 2.776 lie on either side of our observed value.

    Alternatively we can use the 2-values on the second row of Table 7 to arrive at the same answer. We

    cannot therefore say exactly what the probability of obtaining a value at least as extreme as the oneobserved is, but we can specify it within a suitable range and this is sufficient to enable us to conclude

    that there is moderate evidence against the null hypothesis. So even this small sample provides


    7.3 Confidence Intervals for Hypothesis Testing

    Often we may be asked to estimate the population mean, , rather than testing a hypothesis about it.Or we may have performed a test and found evidence against the null hypothesis casting doubt on our

    original hypothesised value. We can (and indeed must) give an estimate of uncertainty along with our

     best estimate of p, which is ~, the sample mean.

    Whatever the value of   



     X  P 



    cross multiplying we get

    95.0]/96.1/96.1[     n X n P         

    Subtracting X  gives

    95.0]/96.1/96.1[     n X n X  P         

    and finally multiplying by —  1 gives,

    95.0]/96.1/96.1[     n X n X  P         

    and this is true whatever the value of , so we can say that the random interval)/96.1,/96.1(   n X n X          has a probability of 0.95 of containing or covering the value of  ;

    that is, 95% of all samples will give intervals (calculated according to this formula) which contain the

    true value of the population mean. This interval is called a 95% confidence interval for  . Note that

    there is no guarantee that any specific sample contains  with 95% probability.

    In general a 100(1 —  )% confidence interval for    is given by


     x       )2/(1 ]

    where )2/(1    denotes an upper percentage point of the standard Normal distribution when  2 is

    known, and given by


     st  x   )2/(1   ]

    when 2

    is unknown.

  • 8/19/2019 PS - Handbook.pdf



    Example 7.1 revisited: The above argument can be used for a 95% confidence interval as we are

    assuming that the population variance 2 is known. Thus)10/02.096.1,10/02.096.1(     X  X   

    is a 95% confidence interval for p and substituting the observed value  x  = 4.01 we obtain (3.9976,


    This interval includes 4 cm. Therefore, no evidence to say that the articles in the batch are


    Example 7.2 revisited :  When 2 is unknown,


    which contains 4 cm with probability 0.95 as t4(0.025) = 2.776 from Table 7, there being o