Probability Models and Their Parametric Estimation

Embed Size (px)

Citation preview

  • 8/6/2019 Probability Models and Their Parametric Estimation

    1/237

    PROBABILITY MODELS AND THEIR PARAMETRIC ESTIMATION

    NET/JRF/CSIR EXAMINATIONS

    A. SANTHAKUMARAN

  • 8/6/2019 Probability Models and Their Parametric Estimation

    2/237

    Dr. A. Santhakumaraan

    Associate Professor and Head

    Department of Statistics

    Salem Sowdeswari College

    Salem - 636010

    Tamil - Nadu

    E-mail: ask.stat @ yahoo.com

  • 8/6/2019 Probability Models and Their Parametric Estimation

    3/237

    About the Author

    Dr. A. Santhakumaran is an Associate Professor and Head Department of Statistics at

    Salem Sowdeswari College, Slaem - 10, Tamil Nadu. He holds a Ph.D. in Statistics -

    Mathematics from the Ramanujan Institute for Advanced Study in Mathematics, Univer-

    sity of Madras. He has interests in Stochastic Processes and Their Applications. He hasto his credit over 31 research papers in Feedback Queues, Statistical Quality Control and

    Reliability Theory. He is the authour of the book Fundamentals of Testing Statistical

    Hypotheses and Research Methodology.

  • 8/6/2019 Probability Models and Their Parametric Estimation

    4/237

    Acknowledgments

    My special thanks to the Correspondent and Secretary of Salem Sowdeswari

    College , Salem and my colleagues for their enthusiastic and unstinted support ren-

    dered for publishing this book. I am grateful to Professor V. Thangaraj, RIASM, Uni-versity of Madras, for his encouragement for writing the book. My greatest debt is

    Dr. J. Subramaniam, Professor of Mathematics, Bannari Amman Institute of Technol-

    ogy, Sathyamangalam, who read most of the manuscript and whose critical comments

    resulted in numerous significant improvements. My thanks to Mr. G. Narayanan, Ra-

    manujan Institute Computer Centre, RIASM, University of Madras, for the suggestions

    rendered by him towards the successful completion of the Latex typeset of the book.

    Finally, I wish to express my gratitude to all my teachers under whose influ-

    ence I have come to appreciate statistics as the science of winding and twisting net-

    work, connecting Mathematics , Scientific Philosophy, Computer Software and other

    intellectual sources of the Millennium. A.SANTHAKUMARAN

  • 8/6/2019 Probability Models and Their Parametric Estimation

    5/237

    PREFACE

    Even though the science of Statistics was originated more than 200 years ago ,

    it was recognized as a separate discipline in the early 1940 in India. From then to

    till now statistics is evolving as a versatile powerful and indispensable instrument foranalyzing the statistical data in real life problems. We have reached a stage where no

    empirical science can afford to ignore the science of Statistics, since the diagnosis of

    pattern of recognition can be achieved through the science of Statistics. Because of the

    speedy growth of modern science and technology, one who learns statistics, he must

    have capacity, knowledge and intellect. Bird has capacity to imitate when we taught.

    The child is not born with a language. But it is born into an innate capacity to learn

    language. So when we teach the child, the child manipulates the structure and creates

    sentences. But a bird cannot do this. So the child has knowledge and capacity to create

    new sentences. If a man has the ability and knowledge he can be inventiveness and

    innovation constitute intellect.

    If a student has ability, knowledge and intellect, then he will be able to learn and

    implement statistics successfully. If these three faculties are lacking, learning of statis-

    tics will not be possible. We shall give a number of examples drawn from the story ofimprovement of natural knowledge and the success of decision making. It shows how

    statistical ideas played an important role in scientific investigations and other decision

    making processes. The most successful man in life is one who makes the best deci-

    sion based on the available information. Practically it is a very difficult task to take a

    decision on a real life problems. We illustrate this with the help of following examples.

    One wants to know that how many ways a bread can be divided into two equivalent

    parts. Immediately one reflects that it is divided into a finite number of ways. In fact

    the bread is divided into two equivalent parts in infinite number of ways. Naturally

    every article can have infinite dimension. Our interest of study may be one dimension

    namely, length of the bread, Area ( = length breath ) two dimension and Volume( = length height breadth) three dimension and so on. Analogous to this arethe measures of average ( location), measures of variability ( scale) and measures of

    skewness and kurtosis (shape).

    Another example is that a new two wheeler is introduced by a manufacturer in the

    market. The manufacturer wants to announce that the two wheeler gives how much

    kilometer per litre on road. For this purpose, the manufacturer ride the two wheeler on

    the road three times and observed that the two wheeler gives 50 km per litre, 55 km

    per litre and 60 km per litre respectively. Suddenly one comes to the mind that the two

    wheeler gives = 50+55+603 = 55 km per litre. This is absolutely wrong. Actually thetwo wheeler gives 60 km per litre, the value of the maximum order statistic.

    A cyclist pedals from his house to his college at a speed of 10 mph and returns back

    his house from the college at a speed of 15 mph. He wants to know his average speed.

    One assumes that the distance between the house and the college is x miles. Then the

    average speed of the cyclist = Total distanceTotal time taken

    = 2xx10 +

    x15

    = 12 mph which is the

    Harmonic Mean.Seven students and a master want to cross a river from one side to other side. The

    students are not able to swim to cross the river. The master measures average height

    of the students which is 5.5. He also measures the depth of the river from one side

    5

  • 8/6/2019 Probability Models and Their Parametric Estimation

    6/237

    to other side in 10 places 2, 2.5, 4, 5.5, 6, 6.5, 10, 2.5,1.5,1which has

    4.15 average depth of the river. The master takes a decision to cross the river on foot,

    since average height of the students is greater than the average depth of the river. The

    students fail to cross the river, since some place the depth of the river is more than

    5.5. The master is not happy for his decision. The master has succeeded to take a

    decision if the minimum height of the students is greater than the maximum depth of

    the river.

    Keeping this in mind, the first chapter of the book deals with some of the well

    known distributions he pattern of recognition of statistical distributions. Chapter 2

    gives the criteria of point estimation. Chapter 3 focuses on the study of optimal estima-

    tion. Chapter 4 illustrates the properties of complete family of distributions. Chapter 5

    explains the methods of estimation. Chapter 6 discusses interval estimation. Chapter 7

    consists of Bayesian estimation.

    6

  • 8/6/2019 Probability Models and Their Parametric Estimation

    7/237

    DISTINCTIVE FEATURES

    Care has been taken to provide conceptual clarity, simplicity and up to date ma-terials.

    Properly graded and solved problems to illustrate each concept and procedureare presented in the text.

    About 300 solved problems and 50 remarks. A chapter on complete family of distributions. It is intended to serve as a text book of one semester course on Statistical Infer-

    ence of Under - Graduate and Post - Graduate Statistics of Indian universities

    and other Applicable Sciences, Allied Statistical Courses, Mathematical Sci-

    ences and various Competitive Examinations like ISS, UGC Junior Fellowship,

    SLET, NET etc.

    Salem - 636010 A.Santhakumaran

    January 2010

    7

  • 8/6/2019 Probability Models and Their Parametric Estimation

    8/237

    CONTENTS

    1 Diagnosis of Statistical Pattern 1 32

    1.1 Introduction

    1.2 Collection of data

    1.3 Diagnosing the Probability Models data

    1.4 Discrete Probability Models

    1.5 Continuous Probability Models

    1.6 Diagnosis of Probability Models

    1.7 Quantile - Quantile plot

    2 Criteria of point estimation 33 732.1 Introduction

    2.2 Point estimator

    2.3 Problems of point estimation

    2.4 Criteria of the point estimation

    2.5 Consistency

    2.6 Sufficient condition for consistency

    2.7 Unbiased estimator

    2.8 Sufficient Statistic

    2.9 Neyman Factorizability Criterion

    2.10 Exponential family of distributions

    2.11 Distribution Admitting Sufficient Statistic

    2.12 Joint Sufficient Statistics

    2.13 Efficient estimator

    3 Complete Family of Distributions 74 943.1 Introduction

    3.2 Completeness

    3.3 Minimal Sufficient Statistic

    4 Optimal Estimation 95 151

    8

  • 8/6/2019 Probability Models and Their Parametric Estimation

    9/237

    4.1 Introduction

    4.2 Uniformly Minimum Variance Unbiased Estimator

    4.3 Uncorrelatedness Approach

    4.4 Rao - Balckwell Theorem

    4.5 Lehman - Scheffe Theorem

    4.6 Inequality Approach

    4.7 Cramer Rao Inequality

    4.8 Chapman - Robbin Inequality

    4.9 Efficiency

    4.10 Extension of Cramer- Rao Inequality

    4.11 Cramer - Rao Inequality - Multiparameter case

    4.12 Bhattacharya Inequality

    5 Methods of Estimation 152 2035.1 Introduction

    5.2 Method of Maximum Likelihood Estimation

    5.3 Numerical Methods of Maximum Likelihood Estimation

    5.4 Optimum property of MLE

    5.5 Method of Minimum Variance Bound Estimation

    5.6 Method of Moment Estimation

    5.7 Method of Minimum Chi - Square Estimation

    5.8 Method of Least Square Estimation

    5.9 Gauss Markoff Theorem

    6 Interval Estimation 204 2266.1 Introduction

    6.2 Confidence Intervals

    6.3 Alternative Method of Confidence Intervals

    6.4 Shortest Length Confidence Intervals

    7 Bayes estimation 227 245

    9

  • 8/6/2019 Probability Models and Their Parametric Estimation

    10/237

    7.1 Introduction

    7.2 Bayes point estimation

    7.3 Bayes confidence intervals

    References

    Glossary of Notation

    Appendix

    Answers to problems

    Index

    10

  • 8/6/2019 Probability Models and Their Parametric Estimation

    11/237

    1. DIAGNOSIS OF STATISTICAL PATTERN

    1.1 Introduction

    Statistics is a decision making tool which aims to resolve the real life problems.It originated more than 2000 years ago, but it was recognized as a separate discipline

    from 1940 in India. From then till now , statistics is evolving as a versatile powerful and

    indispensable instrument for investigation in all fields of real life problems. It provides

    a wide variety of analytical tools. We have reached a stage where no empirical science

    can afford to ignore the science of statistics since the diagnosis of pattern of recognition

    can be achieved through the science of statistics.

    Statistics is a method of obtaining and analyzing data in order to take decisions

    on them. In India, during the period of Chandra Gupta Maurya there was an efficient

    system of collecting official and administrative statistics. During Akbars reign ( 1556

    - 1605AD) people maintained good records of land and agricultural statistics. Statistics

    surveys were also conducted during his reign.

    Sir Ronald A. Fisher known as Father of statistics placed statistics on a very

    sound footing by applying it to various diversified fields. His contributions in statistics

    led to a very responsible position of statistics among sciences

    Professor P. C. Mahalanobis is the founder of statistics in India. He was a

    physicist by training , a statistician by instinct and an economist by conviction. Gov-

    ernment of India has observed on 29th June the birthday of Professor Prasanta Chan-dra Mahalanobis as National Statistics Day. Professor C.R. Rao is an Indian legend

    , whose career spans the history of modern statistics. He is considered by many to be

    the greatest living statistician in the world to day.

    There are many definitions of the term statistics . Some authors have defined

    statistics as statistical data ( plural sense) and others as statistical methods ( singular

    sense).

    Statistics as Statistical Data

    Yule and Kendall state By statistics we mean quantitative data affected to a

    marked extent by multiplicity of causes. Their definition point out the following char-

    acteristics:

    Statistics are aggregates of facts. Statistics are affected to a marked extent by multiplicity of causes. Statistics are numerically expressed. Statistics are enumerated or estimated according to reasonable standards of ac-

    curacy.

    Statistics are collected in a systematic manner. Statistics are collected for a pre - determined purpose and Statistics should be placed in relation to each other.

    11

    Probability Models and their Parametric Estimation

  • 8/6/2019 Probability Models and Their Parametric Estimation

    12/237

    Statistics as Statistical Methods

    One of the best definitions of statistics is given by Croxton and Cowden. They

    define statistics as the science which deals with collection, analysis and interpretation

    of numerical data. This definition points out the scientific ways of :

    Data collection Data presentation data analysis Data interpretation

    Statistics as Statistical Models and Methods

    Statistics is an imposing form of Mathematics. The usage of statistical methods

    has been briskly expanding in the late 20th century, because of the application valueof the statistical models and methods have greater implication in the applications of

    many inter - disciplinary sciences. So we define Statistics as the science of winding

    and twisting network connecting Mathematics, Scientific Philosophy, Computer

    software and other intellectual sources of the millennium.

    This definition reveals that statisticians work to translate real life problems

    into mathematical models by using assumptions or axioms or principles. Then they

    derive exact solutions by their knowledge and thereby intellectually validate the results

    and express their merits in non-mathematical forms which make for the consistency of

    real life problems.

    In real life problems, there are many situations where the actions of the en-

    tities within the system under study cannot be completely predicted with 100 percent

    perfection . There is always some variation. The variation can be classified into two

    categories, i.e., variation due to assignable causes which has to be identified and elim-

    inated; and variation due to chance causes which is equal to 6 values. This is alsocalled natural variation. In general, the reduction of natural variation is not necessaryand involves more cost. So it is not feasible to reduce the natural variation. However,

    some appropriate statistical patterns of recognition may well describe the causes of

    variations.

    An appropriate statistical pattern of recognition can be diagnosed by repeated

    sampling of phenomenon of interest. Then, through the systematic study of these data,

    a statistician can obtain a known distribution suitable for the data and estimates the

    parameters of the distribution. A statistician takes continuous efforts in the selection of

    a distribution form.

    There are four steps in the diagnosis of a statistical distribution. They are

    (i) Data collection

    Data collection for real life problems often requires a substantial knowledge on

    the problems, planning time and resource commitment.

    (ii) Identification of statistical pattern

    When the data are available, identification of a probability distribution begins

    12

    A. Santhakumaran

  • 8/6/2019 Probability Models and Their Parametric Estimation

    13/237

    by developing a frequency distribution or Histogram of the data. Based on the

    pattern of frequency distribution and knowledge on the nature and behaviour of

    the process, a family of distributions is chosen.

    (iii) Parameters selectionChoose parameters that determine a specific instance of a distribution family

    when the data are available. These parameters are estimated from the data.

    (iv) Validity of the distribution

    The validity of the chosen distribution and the associated parameters are evalu-

    ated with the help of statistical tests. The validity of various assumptions made

    on parameter is achieved by certain level of significance only.

    If the chosen distribution is not a good approximation of the data, then the analyst

    goes to the second step, chooses a different family of distributions and repeats the

    procedure.

    If the several iterations of this procedure fail to give a fit between an assumed

    distributional form and the collected data, then the empirical form of the distribution

    may be used.

    1.2 Collection of Data

    Collection of data is one of the important tasks in finding a solution for real life

    problems. Even if the statistical pattern of the real life problems are valid, if the data

    are inaccurately collected, inappropriately analyzed or not representative of the real life

    problems, then the data will be misleading when used for decision making.

    One can learn data collection from an actual experience. The following sug-

    gestions may enhance and facilitate data collection. Data collection and analysis must

    be tackled with great care.

    (i) Before collecting data, planning is very important. It could commence by a prac-

    tice of pre - observing experience. Try to collect the data while pre - observing.

    Forms of the data are devised for due purposes. It is very likely that these forms

    will have to be modified several times before the actual data collection begins.

    Watch for unusual situations or normal circumstances and consider how they

    will be handled. Planning is very important even if the data are collected au-

    tomatically. After collecting the data, find out whether the collected data are

    appropriate or not.

    (ii) If the data being collected are adequate to diagnosize the statistical distributions,

    then determine the apt distribution. If the data being used are useless to diagno-

    size the statistical distribution, then there is no need to collect superfluous data.

    (iii) Try to combine homogeneous data sets. Check data for homogeneity in suc-

    cessive time periods and during the same time period on successive interval oftimes.

    13

    Probability Models and their Parametric Estimation

  • 8/6/2019 Probability Models and Their Parametric Estimation

    14/237

    (iv) Beware of the possibility of data censoring, in which a quantity of interest is not

    observed in its entirety. This problem most often occurs when the analyst is

    interested in the time required to complete some process but the process begins

    prior to or finishes after the completion of the observation period. Censoring can

    result in especially long process times being left out of the data sample.

    (v) One may use scatter diagram which indicates the relationship between the two

    variables of interest.

    (vi) Consider the possibility that a sequence of observations which appear to be in-

    dependent may possess autocorrelation. Autocorrelation may exist in successive

    time periods.

    1.3 Diagnosis of a distribution with data

    The methods for selecting families of distributions are possible, if only the sta-

    tistical data are available. The specific distribution within a family is specified by

    estimating its parameters. Estimating the parameters of a family of distributions leadsto the theory of estimation.

    The formation of frequency distribution or Histogram is useful in guessing the

    shape of a distribution. Hines and Montgomery state that choosing the number of class

    intervals approximately equals the square root of the sample size. If the intervals are too

    long, the Histogram will be coarse or blocking and its shape and other details will not

    smoothen the data. So one has to allow the interval sizes to change until a good choice

    is found. The Histogram for continuous data corresponds to the probability density

    function of a theoretical distribution. If continuous, a line drawn through the centre

    point of each class interval frequency should result in a shape like that of probability

    density function (pdf )( see Figure 1.2).Histogram for discrete data where there are a large number of data points,

    should have a cell for each value in the range of the data. However if there are a few

    data points, it may be necessary to combine adjacent cells to eliminate the ragged ap-pearance of the Histogram. If the Histogram is associated with discrete data, it should

    look like a probability mass function (pmf ) ( see Figure 1.1).

    1.4 Discrete Distributions

    Discrete random variables are used to describe the random phenomenon in which

    only integer values can occur. The following are some important distributions.

    1.4.1 Bernoulli distribution

    An experiment consists of n trials, each trial has a success or a failure and eachtrial is repeated under the same condition. Let Xj = 1 if the j

    th experiment

    resulted in success and let Xj = 0 , if the jth experiment resulted in a failure,

    the sample space has a value 0 and 1. If the trials are independent, each trial hasonly two possible outcomes ( success or failure) and the probability of success

    14

    A. Santhakumaran

  • 8/6/2019 Probability Models and Their Parametric Estimation

    15/237

    remains constant from trial to trial. For one trial the pmf

    p(x) = x(1 )1x x = 0, 1, 0 < < 10 otherwise

    is the Bernoulli distribution function.

    From the above assumptions in a production process, X denotes the quality ofthe produced item, then X follows the Bernoulli random variable.

    1.4.2 Binomial Distribution

    Let X be a random variable, denotes the number of success in n Bernoullitrials. Then the random variable X is called a Binomial random variable withparameters n and . Here the sample space is {0, 1, 2, , n} and the pmfis

    p(x) =

    n!

    x!(nx)! x(1 )nx x = 0, 1, , n, 0 < < 1

    0 otherwise

    In Binomial distribution, the mean is always greater than variance . If

    X1, X2, , Xn are independent and identically distributed Bernoulli randomvariables, then

    ni=1 Xi b(n, ) . The problems relating to tossing a coin

    or throwing dice lead to Binomial distribution . In a production process, the

    number of x defective units in a random sample of n units follows Binomialdistribution.

    1.4.3 Geometric Distribution

    A random variable X is related to a sequence of Bernoulli trials in which thenumber of trials (x + 1) to achieve the first success is

    p(x) = (1 )x x = 0, 1, 2, , 0 < < 10 otherwise

    It is the probability that the event {X = x} occurs, when there are x failuresfollowed by a success.

    A couple decides to have any number of children until they have a male

    child. If the probability of having a male child in their family is p , they haveto expect how many children they will have before the first male child is born.

    X denotes the number of children of the couple. The probability that there arex female children preceding the first male child is born, is a Geometric randomvariable.

    1.4.4 Negative Binomial Distribution

    If X1, X2, , Xn are iid Geometric variables, then T = t(X) =ni=1

    Xi

    a Negative Binomial variate whose pmf is

    p(t) =

    (t+n1)!t!(n1)!

    n(1 )t t = 0, 1, 0 otherwise

    15

    Probability Models and their Parametric Estimation

  • 8/6/2019 Probability Models and Their Parametric Estimation

    16/237

    A random variable X is related to a sequence of Bernoulli trials in which xfailures preceding the nth success in (x + n) trials is given by

    p(x) = (x+n1)!

    (n1)!x!n(1

    )x x = 0, 1, 2,

    0 otherwise

    This will happen if the last trial results in a success and among the previous

    (n + x 1) trials there are exactly x failures. Note that if n = 1 , then p(x)is the Geometric distribution function. Negative Binomial distribution has Mean

    < Variance . In a production process, the number of units that are required toachieve nth defective in x + n units follow Negative Binomial distribution.

    1.4.5 Multinomial DistributionIf the sample space of a random experiment has been split into more than twomutually exclusive and exhaustive events then one can define a random vari-able which leads to Multinomial distribution. Let E1, E2, , Ek be k mu-tually exclusive and exhaustive events of a random experiment with respec-

    tive probabilities 1, 2, , k, such that 1 + 2 + + k = 1 and0 < i < 1, i = 1, 2, , k, then the probability that E1 occurs x1 times, E2occurs x2 times, , Ek occurs xk times in n independent trials is knownas Multinomial distribution with pmf is given by

    p1,2, ,k (x1, x2, , xn) =

    n!x1!x2!xk!

    x11 x22 xkk where

    ki=1 xi = n

    0 otherwise

    If k = 2 , that is, the number of mutually exclusive events is only two, then theMultinomial distribution becomes a Binomial distribution as is given by

    p1,2 (x1, x2) =

    n!

    x1!x2!x11

    x22 where x1 + x2 = n and 1 + 2 = 1

    0 otherwise

    That is x2 = n x1 and 2 = 1 1 which implies

    p1 (x1) =

    n!

    x1!(nx1)! x11 (1 1)nx1 0 < 1 < 1, x1 = 0, 1, , n

    0 otherwise

    Consider two brands A and B. Each individual in the population prefers brand

    A to brand B with probability 1 , prefers B to A with probability 2 and isindifferent between brand A and B with probability 3 = 1 1 2 . Ina random sample of n individuals X1 prefers brand A, X2 prefers brand Band X3 prefers some other brand other than A and B. Then the three randomvariables follow a Trinomial distribution, i.e.,

    p1,2,3 (x1, x2, x3) = P{X1 = x1, X2 = x2, X3 = x3}

    =

    n!x1!x2!x3! x

    11 x

    22 x

    33 x1 + x2 + x3 = n0 otherwise

    16

    A. Santhakumaran

  • 8/6/2019 Probability Models and Their Parametric Estimation

    17/237

    1.4.6 Discrete Uniform Distribution

    A random variable X is said to follow uniform distribution on N points(x1, x2, , xN), if its pmf is given by

    pN(x) = PN{X = xi} =

    1N i = 1, 2, , N and N I+0 otherwise

    A random experiment with complete uncertainty but whose outcomes are equal

    probabilities may describe Uniform distribution. In a finite population of Nunits, one has to select any unit xi, i = 1, 2, , N from the population withsimple random sampling technique which has a discrete uniform distribution.

    1.4.7 Hypergeometric Distribution

    One situation in which Bernoulli trials are encountered is that in which an ob-

    ject is drawn at random from a collection of objects of two types in a box. In

    order to repeat this experiment so that the results are independent and identically

    distributed, it is necessary to replace each object drawn and to mix the objects

    before the next one is drawn. This process is referred to as sampling with re-placement. If the sampling is done no replacement of the objects drawn, the

    resulting trial are still of the Bernoulli type but no longer independent.

    For example, four balls are drawn one at a time, at random and no replace-

    ment from 8 balls in a box, 3 black and 5 red. The probability that the third ball

    drawn is black, i.e.,

    P{ 3rd ball black} = P(RRB) + P(RBB) + P(BRB) + P(BBB)=

    5

    8 4

    7 3

    6+

    5

    8 3

    7 2

    6+

    3

    8 5

    7 2

    6+

    3

    8 2

    7 1

    6

    =3

    8

    which is the same as the probability that the first ball drawn is black. It shouldnot be surprising that this probability for black ball is the same on the third draw

    as on the first draw.

    In general case, n objects are to be drawn at random, one at a time, froma collection of N objects, M of one kind and N M of another kind. Theone kind and of object will be thought of as success and coded 1; the other kind

    is coded 0. Let X1, X2, , Xn denote the sequence of coded outcomes; thatis Xi is 1 or 0 according to whether the ith draw results in success or failure.The total number of success in n trials is just the sum of the X s ,

    Sn = X1 + X2 + + Xnas it was in the case of independent identically distributed Bernoulli trials. That

    is, the probability of a 1 on the ith trial is the same at each trial:

    P{Xi = 1} = MN

    i = 1, 2, , n

    17

    Probability Models and their Parametric Estimation

  • 8/6/2019 Probability Models and Their Parametric Estimation

    18/237

    One can observe first that the probability of a given sequence of N objects is

    1

    N

    1

    N

    1

    1N

    n + 1

    The probability that an object of type 1 occurs in the ith position in the sequenceof N objects is

    P{Xi = 1} = M(N 1)(N 2) (N n + 1)N(N 1) (N n + 2)(N n + 1)

    =M

    Ni = 1, 2, , n

    where M is the number of ways of selecting the ith position with an objectcoded 1 and (N1)(N2) (Nn + 1) is the number of ways of selectingthe remaining (n 1) places in the sequence from the (N 1) remainingobjects. It does not matter whether the number of success among the n objects

    drawn, one at a time, at random or that of simultaneously drawing n at random.The probability function of Sn is

    P{Sn = k} =

    Mk

    N - Mn - k

    Nn

    k = 0, 1, 2, , min(n, M)

    0 otherwise

    The random variable Sn with the above probability function is said to have aHypergeometric distribution. The mean of the random variable Sn is easilyobtained from the representation of a Hypergeometric variable as a sum of the

    Bernoulli trials. That is,

    E[Sn] = E[X1 + X2 + + Xn]= E[X1] + E[Xn] + + E[Xn]= 1 P{X1 = 1} + 0 P{X1 = 0}

    + + 1 P{Xn = 1} + 0 P{Xn = 0}=

    M

    N+ + M

    N=

    nM

    N

    Variance of Sn = nM

    N

    N MN

    N nN 1 if N I+ (1.1)

    The probability at each trial that the object drawn is of the type of which there

    are initially M is p = M

    N

    , then

    Variance of Sn = npqN nN 1 if N I+ (1.2)

    18

    A. Santhakumaran

  • 8/6/2019 Probability Models and Their Parametric Estimation

    19/237

    The above formula (1.2) differs from the formula (1.1) by the extra factor NnN1 .The variance of Sn = npq

    NnN1 in the no replacement case and the variance

    of Sn = npq in the replacement case for fixed p and fixed n , since the factorNnN1

    1 as N becomes finitely many. Thus Hypergeometric distribution isexact where as Binomial distribution is approximate one.

    50 students of the M.Sc. Statistics in a certain college are divided at random

    into 5 batches of 10 each for the annual practical examination in Statistics. The

    class consists of 20 resident students and 30 non - resident students. X denotesthe number of students in the first batch who appear the practical examination.

    The Hypergeometric distribution is apt to describe the random variable X andhas the pmf

    P{X = x} =

    20x

    3010 - x

    5010

    x = 0, 1, 2, , 10

    0 otherwise

    1.4.8 Poisson Distribution

    Poisson random variable is used to describe rare events. For example number of

    air crashes occurred on Monday in 3 pm to 5 pm. The pmf of Poisson randomvariable given as

    p(x) =

    e

    x

    x! > 0, x = 0, 1, 2, 0 otherwise

    where is a parameter. One of the important properties of the Poisson dis-tribution is that mean and variance are the same and are equal to . IfX1, X2,

    , Xn are iid Poisson random variables with parameter , then the

    sum of the random variables

    ni=1 Xi follows a Poisson distribution with pa-

    rameter n .

    After correcting 50 pages of the proof of a book, the proof readers find

    that there are, on the average 2 errors per 5 pages. One would like to know the

    number of pages with errors 0 , 1, 2, 3 in 10000 pages of the first print ofthe book. X denotes the number of errors per page; then the random variableX follows the Poisson distribution with parameter = 25 = .4.

    1.4.9 Power series distribution

    If a random variable X follows a Power series distribution, then its pmf is

    P{X = x} = ax

    x

    f() x S; ax 0, > 00 otherwise

    where f() is a generating function, i.e., f() =

    xS axx, > 0 so that

    f() is positive, finite and differentiable and S is a non - empty countablesubset of non - negative integers.

    19

    Probability Models and their Parametric Estimation

  • 8/6/2019 Probability Models and Their Parametric Estimation

    20/237

    1.4.9 Particular cases.

    (i) Binomial Distribution

    Let = p1+p , f() = (1 + )n and S = {0, 1, 2, 3, , n} a set of non -

    negative integers, then

    f() =xS

    axx

    (1 + )n =n

    x=0

    axx

    ax =n

    x

    Pp{X = x} =

    nx

    p1p

    x[1 + p1p ]

    n

    = nxp

    xqnx x = 0, 1, 2, , n0 otherwise

    (ii) Negative Binomial Distribution

    Let = p1p , f() = (1 )n and S = {0, 1, 2, }, 0 1 and n I+ . Now

    f() =xS

    axx

    (1 )n =

    x=0

    axx

    ax = (1)x - nx

    = (1)x(1)x

    n + x - 1

    x

    =

    n + x - 1

    x

    P{X = x} =

    n + x - 1

    x

    p

    1+p

    x

    1 ( p1+p )

    n=

    n + x - 1

    x

    p

    1 +p

    x(1 +p)n

    =

    n + x - 1

    x

    px(1 +p)(n+x)

    =-n

    x

    (p)x(1 +p)(n+x) x = 0, 1, 2,

    (iii) Poisson distribution

    20

    A. Santhakumaran

  • 8/6/2019 Probability Models and Their Parametric Estimation

    21/237

    Let f() = e and S = {0, 1, 2, }. Now

    f() =

    xSax

    x

    e =xS

    axx

    x=0

    x

    x!=

    x=0

    axx ax = 1

    x!

    P{X = x} = axx

    f()=

    1

    x!

    x

    e=

    ex

    x!x = 0, 1, 2,

    1.5 Continuous Distributions

    Continuous random variable can be used to describe random phenomena in which

    the variable X of interest can take any value x in some interval which has P{X =x} = 0 x in that interval.

    1.5.1 Uniform Distribution

    A random variable X is uniformly distributed at an interval [a, b], if its pdf isgiven by

    pa,b(x) =

    1

    ba a x b0 otherwise

    Note that P{x1 < X < x2} = F(x2) F(x1) = x2x1ba is proportional to thelength of the interval for all x1 and x2 satisfying a x1 x2 b . If randomphenomenon has complete unpredictability, then it can be described as uniform

    distribution.

    1.5.2 Normal Distribution

    A random variable X with mean ( < < ) and variance 2

    (> 0)has a Normal distribution if it has the pdf

    p,2 (x) =

    1

    2e

    122

    [x]2 < x < 0 otherwise

    The time of number of components of a random experiment can be thought of

    as a Normal distribution. The time to assemble a product which is the sum of

    the times required for each assembly operation may describe a Normal random

    variable.

    1.5.3 Exponential Distribution

    A random variable X is said to be Exponentially distributed with parameter

    > 0 , if its pdf is given by

    p(x) =

    ex x > 00 otherwise

    21

    Probability Models and their Parametric Estimation

  • 8/6/2019 Probability Models and Their Parametric Estimation

    22/237

    The value of the intercept on the vertical axis is always equal to the value of .Note that all pdfs eventually intersect at , since the Exponential distributionhas its mode at the origin. The mean and standard deviation are equal in Ex-

    ponential distribution. In a random phenomenon, the time between independent

    events which have memory less property may appropriately follow Exponential

    random variable. For example, the time between the arrivals of a large number

    of customers who act independently of each other may fit adequately the data to

    Exponential distribution.

    1.5.4 Gamma Distribution

    A function used to define the Gamma distribution is the Gamma function. A

    random variable X follows a Gamma distribution, if

    p, (x) =

    exx1 x > 0, > 0, > 0

    0 otherwise

    where is called the shape parameter and is called the scale parameter.

    ni=1

    Xi

    G(n, 1

    ) , if each Xi

    exp( 1

    ) . The cumulative distributionfunction F(x) = P{X x} of a random variable X is given by

    F(x) =

    1

    x (t)

    1et dt x > 00 otherwise

    1.5.5 Erlang Distribution

    The pdf of the Gamma distribution becomes Erlang distribution of order kwhen = k an integer. When = k a positive integer, the cumulative distri-bution function F(x) is given by

    F(x) =

    1 k1i=0 ekx (kx)ii! x > 00 otherwise

    which is the sum of Poisson terms with mean kx .

    1.5.6 Weibull Distribution

    A random variable X has a Weibull distribution if it has pdf

    p,, (x) =

    x

    1exp[( x ) ] x

    0 otherwise

    The three parameters of the Weibull distribution are ( < < ) which isthe location parameter, ( > 0) which is the scale parameter and ( > 0)which is the shape parameter. When = 0 the Weibull pdf becomes

    p,(x) =

    (

    x )

    1exp[( x ) ] x 00 otherwise

    When = 0 and = 1 , the Weibull distribution is reduced to Exponentialdistribution with pdf

    p(x) =

    1 e

    x x 00 otherwise

    22

    A. Santhakumaran

  • 8/6/2019 Probability Models and Their Parametric Estimation

    23/237

    1.5.7 Triangular Distribution

    A random variable X has a Triangular distribution if its pdf is given by

    pa,b,c(x) =

    2(xa)(b

    a)(c

    a) a

    x

    b

    2(cx)(cb)(ca) b < x c0 otherwise

    where a b c. The mode occurs at x = b , since a b c, it followsthat 2a+c3 E[X] a+2c3 . The mode is used more often than the mean tocharacterize the Triangular distribution.

    1.5.8 Empirical Distribution

    An empirical distribution may be either continuous or discrete in nature. It is

    used to establish a statistical model for the available data whenever there is a

    discrepancy in the aimed distribution or whenever one can unable to arrive at a

    known distribution.

    (a) Empirical Continuous DistributionsThe time taken to install 100 machines is collected. The data are given in Table

    1.1 which gives the number of machines together with time taken. For example,

    30 machines have installed between 0 and 1 hour, 25 between 1 and 2 hour, 20

    between 2 and 3 hour and 25 between 3 and 4 hour. X denotes time taken toinstall the machines.

    Table 1.1 Distribution of the time taken to install the Machines

    Durationof Hours Frequency p(x) F(x) = P

    {X

    x}0 x 1 30 .30 .30

    1 < x 2 25 .25 .552 < x 3 20 .20 .753 < x 4 25 .25 1.00

    (b) Empirical Discrete Distributions

    At the end of the day, the number of shipments on the loading docks of an export

    company are observed as 0, 1 , 2, 3, 4 and 5 with frequencies 23, 15, 12, 10, 25

    and 15 respectively. Let X be the number of shipments on the loading docks ofthe company at the end of the day. Then X is a discrete random variable whichtakes the values 0 , 1, 2, 3, 4 and 5 with the distribution as given in Table 1.2.

    Figure 1.1 is the Histogram of number of shipments on the loading docks of the

    company.

    23

    Probability Models and their Parametric Estimation

  • 8/6/2019 Probability Models and Their Parametric Estimation

    24/237

    Table 1.2 Distribution of number of shipments

    Number ofshipments x Frequency P{X = x} F(x) = P{X x}0 23 .23 .23

    1 15 .15 .38

    2 12 .12 .50

    3 10 .10 .60

    4 25 .25 .85

    5 15 .15 1.00

    0 1 2 3 4 5

    Figure 1.1 Histogram of shipmentsNumber of shipments

    5

    10

    1520

    25

    FREQ

    U

    ENCY

    1.6 Diagnosis of distributions

    The probability of an item whose value of the variable is constant increment, is

    an Exponential distribution. This is apt to fit the data. If the probability of a variable

    of an item whose value of the variable is either positive or negative, then a Normal

    distribution is appropriate to the data. When the variable of interest seems to follow

    the Normal probability distribution, the random variable is restricted to be greater than

    or less than a certain value. The truncated Normal distribution will be adequate to fit

    the data. The Gamma and Weibull distributions are also used to describe the data. The

    Exponential distribution is a special case of both the Gamma and Weibull distributions.

    The difference between the Exponential, Gamma and Weibull distributions involve the

    location of modes of the pdf s and the shapes of their tails will be in proportion to

    24

    A. Santhakumaran

  • 8/6/2019 Probability Models and Their Parametric Estimation

    25/237

    large and short times. The Exponential distribution has its mode at the origin but the

    Gamma and Weibull distributions have their modes at some point( 0 ) which is afunction of the parameters values selected. The tail of the Gamma distribution is long,

    like an Exponential distribution while the tail of the Weibull distribution may decline

    more rapidly or less rapidly than that of an Exponential distribution. In practice, if

    there are higher value of the variable than an Exponential distribution, it can account

    for a Weibull distribution which provides a better distribution of the data.

    Illustration 1.6.1

    Sixteen equipments were produced and placed on test and the Table 1.3 gives the

    length of time intervals between failures in hours.

    Table 1.3 Equipments time between failuresEquipment

    Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

    Timebetweenfailures 19 12 16 1 15 5 10 1 46 7 33 25 4 9 1 10

    For the sake of simplicity in processing the data , one can set up the ordered set as

    given blow:

    Table 1.4 Ordered set of equipment time between failuresEquipment

    Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16Time

    betweenfailures 1 1 1 4 5 7 9 10 10 12 15 16 19 25 33 46

    On this basis, one may construct a Histogram to judge the pattern of the data in Table

    1.4. An approximate value of the interval can be determined from the formula.

    t =maximum value - minimum value

    1 + 3.3log10 N

    where the maximum and minimum are the values in the ordered set and N is the totalnumber of items of the order statistics. In this case maximum value is 46 , minimum

    value is 1 and N is 16. Thus t = 451+3.3 log10 16= 9.05 10 = width of the class

    interval.

    Table 1.5 Frequency DistributionTime

    interval 0 - 10 10 - 20 20 - 30 30 - 40 40 - 50Number ofEquipment 9 4 1 1 1

    25

    Probability Models and their Parametric Estimation

  • 8/6/2019 Probability Models and Their Parametric Estimation

    26/237

    Histogram is drawn based on the frequency distribution in Table 1.5 and is given in

    Figure 1.2.

    Time interval50403020100

    9

    4

    1 1 1

    Figure 1.2 Histogram of time to failures

    Numberof

    Equipment

    The Histogram reveals that the distribution could be Negative Exponential or the

    right portion of the Normal distribution. Assume the time to failure follows Exponen-

    tial distribution of the form,

    p(x) =

    ex > 0, x > 00 otherwise

    How for the assumption is valid has to be verified? The validity of the assumption

    is tested by the 2 test of goodness of fit.

    Table 1.6 Distribution of time to failures

    Interval pi

    Expected

    frequency

    E

    Observedfrequency

    O

    0 - 10 .5262 8.41 8 910 - 20 .2493 3.98

    4 4

    20 - 30 .1181 1.886 2 130 - 40 .0559 .8944 1 140 - 50 .0265 .454 1 1

    26

    A. Santhakumaran

  • 8/6/2019 Probability Models and Their Parametric Estimation

    27/237

    where pi =xi+1

    xiexdx = exi exi+1 , i = 0, 10, 20, , 50. If the cell

    frequencies are less than 5, then it can be made 5 or more than 5. One may get two

    classes only, i.e, the expected frequencies are equal to 8 each and the corresponding

    observed frequencies are 9 and 7 respectively. The 2 test of goodness of fit failsto test the validity of the assumption that the sample data come from an Exponential

    distribution with parameter = 113.38 = .0747 = failure rate per unit hour where themean life time of the equipments = 21416 = 13.38 hours. To test the validity of theassumption that the time to failure follows an Exponential distribution, consider the

    likelihood function of the cell frequencies of o1 = 9 and o2 = 7 is

    L =

    n!

    o1!o2!

    e1n

    o1 e2n

    o2 o1 + o2 = n0 otherwise

    Under H0 the likelihood function follows a Binomial probability law b(16, p) wherep = e1n . To test the hypothesis that H0 : the fit is the best one vs H1 : the fit is not thebest one. It is equivalent to test the hypothesis that H0 : p .5 vs H1 : p > .5 TheUMP level = .05 test is given by

    (x) =

    1 ifx > 11.17 ifx = 110 otherwise

    The observed value is 9 which is less than 11. There is no evidence to reject the

    hypothesis H0 . The data come from an Exponential distribution with 5% level ofsignificance. Thus time to failure of the equipments follows an Exponential distribu-

    tion. One may conclude that on an average the equipment would be operated for 13.38

    hours without failure.

    1.7 Quantile - Quantile plot

    The construction of Histograms and the recognition of a distributional shape arenecessary ingredients for selecting a family of distributions to represent a sample data.

    A Histogram is not useful for evaluating the fit of the chosen distribution. When there

    are a small number of data points ( 30 ), a Histogram can be rather ragged. Furtherperception of the fit depends on the width of the Histogram intervals. Even if the

    intervals are well chosen, grouping the data into cells makes it difficult to compare a

    Histogram to a continuous pdf . A quantile - quantile (q - q) plot is a useful tool forevaluating distribution fit that does not suffer from these problems.

    If X is a random variable with cumulative distribution F(x) , then q quantile ofX is that value y such that F(y) = P{X y} = q for 0 < q < 1 . When F(x)has an inverse y = F1(q) . Let x1, x2, , xn be a sample observations of X.Order the observations from the smallest to the largest and denote these as yj , j = 1to n where y1

    y2

    yn . One can denote j the rank or order number.

    Therefore j = 1 for the smallest and j = n for the largest. The q - q plot is based on

    the fact that yj is an estimate of the (j 12

    n ) quantile of X, i.e, yj is approximately

    F1

    j 12n

    .

    27

    Probability Models and their Parametric Estimation

  • 8/6/2019 Probability Models and Their Parametric Estimation

    28/237

    A distribution with cumulative distribution function F(x) is a possible represen-tation of the random variable X. If F(x) is a member of an appropriate family of

    distributions, then a plot of yj versus F1

    j 12

    n will be approximately a straightline.

    If F(x) is from an appropriate family of distributions and also has appropriateparameter values, then the line will have slope 1. On the other hand, if the assumed

    distribution is inappropriate, the points will deviate from a straight line in a systematic

    manner. The decision whether to accept or reject some hypothesized distribution is

    subjective.

    In the construction of q - q plot, the following should be borne in mind.

    (i) The observed values will never fall exactly on a straight line. (ii) The ordered values

    are not independent, since they have been ranked. (iii) The variances of the extremes

    are much higher than the variances in the middle of the plot. Greater discrepancies can

    be accepted at the extremes. The linearity of the points in the middle of the plot is more

    important than the linearity at the extremes.

    Illustration 1.7.1

    A sample of 20 repairing times of electronic watch was considered. The repairing

    time X is a random variable. The values are in seconds on the random variable X.The values are arranged in the increasing order of magnitude as in Table 1.7.

    Table 1.7 Repairing times of electronic watch

    j Value j Value j Value j Value1 88.54 6 88.82 11 88.98 16 89.26

    2 88.56 7 88.85 12 89.02 17 89.30

    3 88.60 8 88.90 13 89.08 18 89.35

    4 88.64 9 88.95 14 89.18 19 89.41

    5 88.75 10 88.97 15 89.25 20 89.45

    28

    A. Santhakumaran

  • 8/6/2019 Probability Models and Their Parametric Estimation

    29/237

    Table 1.8 Normal Quantile

    jj1220

    yj =F1(

    j1220

    )xj = yj .08

    + 88.993 jj 1220

    yj =F1(

    j1220

    )xj = yj .08

    + 88.993

    1 .025 -1.96 88.84 11 .525 .06 89.00

    2 .075 - 1.41 88.88 12 .575 .18 89.01

    3 .125 - 1.13 88.90 13 .625 .31 89.02

    4 .175 - 0.93 88.92 14 .675 .45 89.03

    5 .225 - 0.75 88.94 15 .725 .60 89.04

    6 .275 -.60 88.95 16 .775 .75 89.05

    7 .325 -.45 88.96 17 .825 .93 89.07

    8 .375 -.31 88.97 18 .875 1.13 89.08

    9 .425 - .18 88.98 19 .925 1.41 89.11

    10 .475 -.06 88.99 20 .975 1.96 89.15

    The ordered observations in Table 1.8 are then plotted versus F1

    j 12n

    for

    j = 1, 2, , 20 where F(.) is the cumulative distribution function of the Normalrandom variable X with mean 88.993 seconds, and standard deviation .08 seconds to

    obtain the q - q plot. The plotted values are shown in Figure 1.3. The general per-ception of a straight line is quite clear in the q - q plot, supporting the hypothesis of a

    normal distribution.

    Normalquantile

    yj

    Time xjFigure 1.3 q q plot of the repairing times

    Note: The diagnosis of statistical distributions of real life problems are not exact

    but at best they represent reasonable approximations.

    Problems

    1.1 The mean and variance of the number of defective items drawn randomly oneby one with replacement from a lot are found to be 10 and 6 respectively. The

    distribution of the number of defective items is:

    (a) Poisson with mean 10

    29

    Probability Models and their Parametric Estimation

  • 8/6/2019 Probability Models and Their Parametric Estimation

    30/237

    (b) Binomial with n = 25 and p = 0.4(c) Normal with mean 10 and variance 6

    (d) None of the above

    1.2 If X is a Poisson random variate with mean 3, then P{|X 3| < 1} will be:(a) 12 e

    3 (b) 3e3 (c) 4.5e3 (d) 27e3

    1.3 Let U(1), U(2), , U(n) be the order statistics of a random sampleU1, U2, , Un of size n from the uniform (0, 1) distribution. Then the con-ditional distribution of U1 given U(n) = u(n) is given by:(a) Uniform on (0, u(n))

    (b) P{U1 = u(n)} = 1n and probability n1n is uniformly distributed over(0, u(n)) .

    (c) Beta

    1n

    , n1n

    (d) Uniform (0, 1)

    1.4 A biased coin is tested 4 times or until a head turns up, whichever occurs earlier.

    The distribution of the number of tails turning up is:(a) Binomial (b) Geometric (c) Negative Binomial (d) Hypergeometric

    1.5 If X and Y are independent Exponential random variables with the same mean , then the distribution of min(X, Y) is :(a) Exponential with mean 2(b) Exponential with mean (c) not Exponential with mean (d) Exponential with mean 2

    1.6 The 2 goodness of fit is based on the assumption that a character under study is(a) Normal (b) Non - Normal (c) any distribution (d) not required

    1.7 The exact distribution of 2 goodness of fit for each experiment unit is classified

    into one of more k categories of a random sample of size n depends on :(a) Hypergeometric distribution

    (b) Normal distribution

    (c) Multinomial distribution

    (d) Binomial distribution

    1.8 If X1 b(n1, 1) , X2 b(n2, 2) and X1 , X2 are independent, then thesum of the variates X1 + X2 is distributed as :(a) Hypergeometric distribution

    (b) Binomial distribution

    (c) Poisson distribution

    (d) None of the above

    1.9 If X1

    b(n1

    , ) , X2

    b(n2

    , ) and X1

    , X2

    are independent, then the sum

    of the variates X1 + X2 is distributed as :(a) Hypergeometric distribution

    (b) Binomial distribution

    30

    A. Santhakumaran

  • 8/6/2019 Probability Models and Their Parametric Estimation

    31/237

    (c) Poisson distribution

    (d) None of the above

    1.10 If X1

    P(1), X2

    P(2) and X1 , X2 are independent,then the sum of

    the variates X1 + X2 is distributed as :(a) Hypergeometric distribution

    (b) Binomial distribution

    (c) Poisson distribution

    (d) None of the above

    1.11 The skewness of a Binomial distribution will be zero if:

    (a)p < .5 (b) p > .5 (c) p = .5 (d) p = .51.12 If the sample size n = 2 , the students t - distribution reduces to:

    (a) Normal distribution

    (b) F - distribution(c) 2 - distribution(d) Cauchy distribution

    1.13 The reciprocal property of Fn1,n21 distribution can be expressed as:(a) Fn2,n1 (1 ) = 1Fn1,n2 ()(b) P{Fn1,n2 () c} = P

    Fn2,n1 () 1c

    (c) Fn2,n1 (1 2 ) = 1Fn1,n2 ( 2 )(d) All the above

    1.14 The distribution of which the moment generating function is not useful in finding

    the moments is :

    (a) Binomial distribution

    (b) Negative Binomial distribution

    (c) Hypergeometric distribution

    (d) Geometric distribution

    1.15 Probability of selecting a unit from a population of N units in a simple randomsampling technique is a :

    (a) Bernoulli distribution

    (b) Binomial distribution

    (c) Geometric distribution

    (d) discrete Uniform distribution

    1.16 A production process is a sequence of Bernoulli trials, the number of x defectiveunits in a sample of n units is a:(a) Bernoulli distribution

    (b) Binomial distribution

    (c) Multinomial distribution

    (d) Hypergeometric distribution

    1.17 A random variable X is related to a sequence of Bernoulli trials in which thenumber of trials (x + 1) to achieve the first success, then the distribution of X

    31

    Probability Models and their Parametric Estimation

  • 8/6/2019 Probability Models and Their Parametric Estimation

    32/237

    is :

    (a) Bernoulli distribution

    (b) Binomial distribution

    (c) Multinomial distribution

    (d) Geometric distribution

    1.18 If X1, X2, , Xn are iid Geometric variables, thenn

    i=1 Xi follows:(a) Negative Binomial distribution

    (b) Binomial distribution

    (c) Multinomial distribution

    (d) Geometric distribution

    1.19 A random variable X is related to a sequence of Bernoulli trials in which xfailures preceding the nth success in (x + n) trials is a :(a) Binomial distribution

    (b) Multinomial distribution

    (c) Negative Binomial distribution

    (d) Geometric distribution

    1.20 If a random experiment has only two mutually exclusive outcomes of a Bernoulli

    trial, then the random variable leads to:

    (a) Binomial distribution

    (b) Multinomial distribution

    (c) Negative Binomial distribution

    (d) Geometric distribution

    1.21 A box contains N balls M of which are white and N M are red. If Xdenotes the number of white balls in the sample contains n balls with replace-ment, then X is a :(a) Binomial variate

    (b) Bernoulli variate

    (c) Negative Binomial variate

    (d) Hypergeometric variate

    1.22 The number of independent events that occur in a fixed amount of time may

    follow:

    (a) Exponential distribution

    (b) Poisson distribution

    (c) Geometric distribution

    (d) Gamma distribution

    1.23 A power series distribution

    P{

    X = x}

    = axx

    f() x S, ax 00 otherwise

    where f() = (1 + )n, = p(1p) and S = {0, 1, 2, } . Then the randomvariable X has

    32

    A. Santhakumaran

  • 8/6/2019 Probability Models and Their Parametric Estimation

    33/237

    (a) Geometric distribution

    (b) Bernoulli distribution

    (c) Binomial distribution

    (d) Negative Binomial distribution

    1.24 The given probability function p(x) = 23x+1 for x = 0, 1, 2, 3, , represents:(a) Negative Binomial distribution

    (b) Binomial distribution

    (c) Bernoulli distribution

    (d) Geometric distribution

    1.25 Dinesh Kumar receives 2, 2, 4 and 4 telephone calls on 4 randomly selected days.

    Assuming that the telephone calls follow Poisson distribution, the estimate of the

    number of telephone calls in 8 days is:

    (a) 12 (b) 3 (c) 24 (d) none of the above

    1.26 The exact distribution of 2 goodness of fit test for each experiment units isclassified into one of two categories of a random sample of size

    ndepends on :

    (a) Hypergeometric distribution

    (b) Normal distribution

    (c) Multinomial distribution

    (d) Binomial distribution

    1.27 The pmf of a random variable X is

    p(x) =

    k=0(1)k

    k + x

    k

    x+k

    (x+k+1) x = 0, 1, = 0 otherwise

    It is known as

    (a) Binomial ( b) Negative Binomial (c) Poisson (d) Geometric

    33

    Probability Models and their Parametric Estimation

  • 8/6/2019 Probability Models and Their Parametric Estimation

    34/237

    2. CRITERIA OF POINT ESTIMATION

    2.1 Introduction

    In real life applications, determining appropriate distributions from the randomsample is a major task. Faulty assumption of distributions will lead to misleading rec-

    ommendations. As a family of distributions induced by a parameter has been selected,

    the next step is to estimate the parameters of the distribution. The criteria of the point

    estimators for many standard distributions are described in this chapter.

    The set of all admissible values of parameters of a distribution is called the parame-

    ter space . Any member from the parameter space is called parameter. For example,a random variable X is assumed to follow a normal distribution with mean andvariance 2 . The parameter space = {(, ) | < < , 0 < 2 < } .Suppose a random sample X1, X2, X3, , Xn is taken on X. Here a statisticT = t(X) from the sample X1, X2, , Xn which gives the best value for the pa-rameter . The particular value of the Statistic T = t(x) = x based on the valuesx1, x2,

    , xn is called an estimate of . If the statistic T = X is used to estimate

    the unknown parameter , then the sample mean is called an estimator of . Thus anestimator is a rule or a procedure to estimate the value of . The numerical value x iscalled an estimate of .

    2.2 Point Estimator

    Let X1, X2, , Xn be n independent identically distributed ( iid ) randomsample drawn from a population with probability density function (pdf ) p(x) , . The statistic T = t(X) is said to be a point estimator of , if the func-tion T = t(X) has a single point (X) which maps to in the parameter space .

    2.3 Problems of Point EstimationThe problems involved in point estimation are

    to select or choose a statistic T = t(X) . to find the distribution function of the statistic T = t(X) . to verify the selected statistic satisfies the criteria of the point estimation .

    2.4 Criteria of the Point Estimation

    The criteria of the point estimation are

    (i) Consistency

    (ii) Unbiasedness

    (iii) Sufficiency and

    (iv) Efficiency

    34

    A. Santhakumaran

  • 8/6/2019 Probability Models and Their Parametric Estimation

    35/237

    2.5 Consistency

    Consistency is a convergence property of an estimator. It is an asymptotic or large

    sample size property. Let X1, X2, , Xn be iid random sample drawn from a pop-ulation with common distribution P, . An estimator T = t(X) is consistentfor if for every > 0 and for each fixed , P{|T| > } as n ,i.e. T

    P as n for fixed .Example 2.1 Let X1, X2, , Xn be a random sample drawn from a normal

    population with mean and known variance 2 . The statistic T = X is chosen for

    an estimator of the parameter . The statistic X N( , 2n ). To test the consistencyof the estimator, consider for every > 0 and fixed ,

    P{|X | > } = 1 P{|X | < }= 1 P{ < X < }= 1 P{

    n/ 0 and fixed ,P{|X | > } = 1 P{ < X < }

    = 1 P{ < X < + }

    = 1 +

    1

    1

    1 + (x )2 dx

    since X Cauchy distribution with parameter = 1

    1

    1

    1 + z2dz where x = z

    = 1 1

    [tan1(z)]

    = 1 2

    tan1() since tan1() = tan1()

    35

    Probability Models and their Parametric Estimation

  • 8/6/2019 Probability Models and Their Parametric Estimation

    36/237

    Thus P{|X | > } 0 as n i.e., X

    P

    as n . For Cauchy population the sample mean X is not aconsistent estimator of the parameter .

    2.6 Sufficient condition for consistency

    Theorem 2.1 If {Tn}n=1 is a sequence of estimator such that E[Tn] andV[Tn] 0 as n , then the statistic Tn is a consistent estimator of the param-eter .

    Consider E[Tn ]2 = E (Tn E[Tn] + E[Tn] )2= E (Tn E[Tn])2 + {E[Tn ]}2= V[Tn] + {E[Tn ]}2

    since E (Tn E[Tn]) = 0

    By Chebychevs inequality

    P{|Tn | > } 12

    E[Tn ]2

    12

    V[Tn] + {E[Tn ]}2

    0 as n

    since V[Tn] 0 and E[Tn] as n . Tn is a consistent estimator of .

    Remark 2.2 The conditions are only sufficient, but not necessary. Since if

    {Xn}n=1 is a sequence of iid random variables from a population with finite mean = E[X] , then X converges to in probability for each fixed . It is knownas Khintchins Weak Law of Large Numbers, i.e., sample mean X finitely exists, is aconsistent estimator for the population mean which does not require the conditionV[X] 0 as n for every fixed . Thus consistency follows the ex-istence of the expectation of the statistic and the assumption of finite variance of the

    statistic is not needed.

    For illustration the Cauchy pdf is

    p(x) =

    1

    11+x2 < x <

    0 otherwise

    The mean E[X] does not exist finitely, i.e.,

    E[X] =

    1

    x

    1 + x2dx

    36

    A. Santhakumaran

  • 8/6/2019 Probability Models and Their Parametric Estimation

    37/237

    is divergent. But the Cauchy Principle value

    1

    lim

    t

    t

    t

    x

    1 + x2dx =

    1

    2lim

    t

    t

    t

    2x

    1 + x2dx

    =1

    2lim

    t

    log(1 + x2)tt

    =1

    2lim

    t[log(1 + t2) log(1 + t2)]

    = 0

    The Cauchy Principle value 0 is taken as the mean of the Cauchy distribution. Thus the

    Cauchy distribution has not the mean finitely exist. Hence for the Cauchy population,

    the sample mean X is not a consistent estimator of the parameter .Example 2.3 If X1, X2, , Xn is a random sample drawn from a normal popu-

    lation N( 0, 2 ). Show that 13nn

    k=1 X4k is a consistent estimator of

    4.Let T = 13n

    nk=1 X

    4k .

    E4 [T] =1

    3n

    nk=1

    E4 [X4k ]

    =1

    3n

    nk=1

    E4 [Xk 0]4 since E[Xk] = 0 k = 1, 2,

    =1

    3nn4 =

    1

    3n3n4 since 4 = 3

    4 where

    2n = 1 3 5 (2n 1)2n n = 1, 2, = 4

    V4

    [T] =

    1

    (3n)2

    nk=1 V

    4

    [X

    4

    ]

    =1

    (3n)2

    nk=1

    E4 [X

    4k ]

    2 E4 [X4k ]2

    =1

    (3n)2n[8 24]

    =1

    32n[1058 (34)2] since 8 = 1 3 5 7 8

    =1

    32n968 0 as n .

    Thus T is a consistent estimator of 4 .Example 2.4 Let

    X1, X2, Xnbe a random sample drawn from a population

    with rectangular distribution (0, ), > 0 . Show that (ni=1 Xi) 1n is a consistentestimator of e1 .

    37

    Probability Models and their Parametric Estimation

  • 8/6/2019 Probability Models and Their Parametric Estimation

    38/237

    Let GM = (n

    i=1 Xi)1n Xi > 0, i = 1, 2, , n .

    loge GM =1

    n

    n

    i=1

    log Xi

    E[log X] =1

    0

    log xdx

    =1

    [x log x]0

    0

    dx

    =1

    log lim

    x0x log x

    = log 1

    Since limx0

    x log x = limx0

    log x1x

    = limx0

    1x

    1x2= 0

    E[log X]2

    =

    1

    0 (log x)2

    dx

    =1

    [x(log x)2]0

    1

    0

    2xlog x

    xdx

    = (log )2 1

    limx0

    x(log x)2 2

    [ log ]= (log )2 2log + 2 since lim

    x0x(log x)2 = 0

    V[log X] = (log )2 2log + 2 (log 1)2 = 1

    V[log GM] =1

    n2

    ni=1

    V[log Xi] =1

    n

    V[log GM]

    0 as n

    ,

    > 0

    Thus loge GM is a consistent estimator of log 1 , i.e., GM is a consistent estimatorof e1 .

    Example 2.5 Let X1, X2, , Xn be iid random sample drawn from a pop-ulation with E[Xi] = and V[Xi] =

    2, i = 1, 2, , n. Prove that

    38

    A. Santhakumaran

  • 8/6/2019 Probability Models and Their Parametric Estimation

    39/237

    2n(n+1)

    ni=1 iXi is a consistent estimator of .

    E n

    i=1

    iXi = E[X1 + 2X2 + + nXn]= + 2 + + n= [1 + 2 + + n]=

    n(n + 1)

    2

    2

    n(n + 1)E

    n

    i=1

    iXi

    = ,

    V

    n

    i=1

    iXi

    =

    ni=1

    i2V[Xi]

    = 2n

    i=1

    i2

    = 2n(n + 1)(2n + 1)

    6

    V

    2

    n(n + 1)

    ni=1

    iXi

    =

    2

    3

    (2n + 1)

    n(n + 1)2 0 as n

    Thus 2n(n+1)n

    i=1 iXi is a consistent estimator of .

    Consistent estimator is not unique

    Example 2.6 Let T = max1in{Xi} be the nth order statistic of a randomsample of size n drawn from a population with a uniform distribution on the interval

    ( 0, ). The pdf of T is

    p(t) =

    ntn1

    n 0 < t < , > 00 otherwise

    E[T] =n

    n

    0

    tndt =n

    n + 1

    E[T2] =

    n2

    (n + 2), V[T] =

    n2

    (n + 2)(n + 1)2

    Thus E[T] and V[T] 0 as n . T is a consistent estimator of . AlsoE

    (n+1)

    n T

    = and V[(n+1)

    n T] =2

    n(n+2) 0 as n , i.e., (n+1)n T is

    also a consistent estimator of . The statistic T and

    (n+1)

    n T are the two consistentestimators of the same parameter . Thus consistent estimator is not unique.

    39

    Probability Models and their Parametric Estimation

  • 8/6/2019 Probability Models and Their Parametric Estimation

    40/237

    2.7 Invariance Property of Consistent Estimator

    If T = t(X) is a consistent estimator of , then anT, T + cn, and anT + cnare also consistent estimators of , where an = 1 +

    kn , k and an 1 and

    cn 0 as n for every fixed . In general, we have the Theorem 2.2.Theorem 2.2 If Tn = tn(X) is a consistent estimator of () and (()) is acontinuous function of () , then (Tn) is a consistent estimator of (()) .

    Proof Given Tn = tn(X) is a consistent estimator () , i.e., TnP () as

    n .Therefore for given > 0, > 0 , there exist a positive integer n N(, ) suchthat

    P{|Tn ()| < } > 1 n NAlso (.) is a continuous function , i. e., For every such that{|(Tn) (())|} < 1 whenever |Tn ()| < i.e., |Tn ()| < |(Tn) (())| < 1

    For any two events A and B if A B , then A B .Therefore P(A) P(B), i.e., P(B) P(A) . Let A = {|Tn ()| < } andB = {|(Tn) (())| < 1} thenP{(Tn) (())| < 1} P{|Tn ()| < }i.e.,P{|(Tn) (())| < 1} 1 n N (Tn) P (()) asn .i.e.,(Tn) is a consistent estimator of (())

    Example 2.7 Suppose T = t(X) is a statistic with pdf p(x) for > 0, .Prove that T2 = t2(X) is a consistent estimator of 2 , if T = t(X) is a consistentestimator of .

    Given T = t(X) is a consistent estimator of .By the definition of consistent estimator, P{|T | < } 1 as n , for >0,

    , consider

    P{|T | < } = P{ < T < + }= P{( )2 < T2 < ( + )2}= P{2 < T2 2 2 < 2}= P{ < T2 2 2 < }

    where = 2

    = P{ < T 2 < }where T = T2 2

    = P{|T 2| < } 1 as n

    T = T2 2 T2 as n since 0 as n ... P{|T2 2| < } 1 as . Thus T2 is a consistent estimator of 2 .

    40

    A. Santhakumaran

  • 8/6/2019 Probability Models and Their Parametric Estimation

    41/237

    2.8 Unbiased Estimator

    For any statistic g(T) , if the mathematical expectation is equal to a parameter () ,then g(T) is called an unbiased estimator of the parameter () ,

    i.e., E[g(T)] = (), .

    Otherwise, the statistic g(T) is said to be a biased estimator of () . The unbiasedestimator is also called zero bias estimator. A statistic g(T) is said to be asymptoticallyunbiased estimator if E[g(T)] () as n , .

    Example 2.8 A random variable X has the pdf

    p(x) =

    2x if0 < x < 1(1 ) if1 x < 2, 0 < < 10 otherwise

    Show that g(X) , a measurable function of X is an unbiased estimator of if and

    only if 10 xg(x)dx =

    1

    2and 2

    1 g(x)dx = 0.Assume g(X) is an unbiased estimator of , i.e.,

    E[g(X)] = 10

    g(x)2xdx +

    21

    g(x)(1 )dx =

    10

    2xg(x)dx 2

    1

    g(x)dx

    +

    21

    g(x)dx =

    1

    0

    2xg(x)dx 2

    1

    g(x)dx = 1 and

    2

    1

    g(x)dx = 0

    i.e.,

    10

    xg(x)dx =1

    2and

    21

    g(x)dx = 0

    Conversely,1

    0xg(x)dx = 12 and

    21

    g(x)dx = 0, then g(X) is an unbiased esti-mator of .

    E[g(X)] =

    10

    2xg(x)dx +

    21

    (1 )g(x)dx

    = 2 1

    0

    xg(x)dx + (1 ) 2

    1

    g(x)dx

    = 21

    2+ (1 ) 0

    =

    41

    Probability Models and their Parametric Estimation

  • 8/6/2019 Probability Models and Their Parametric Estimation

    42/237

    Thus g(X) is an unbiased estimator of .Example 2.9 If T denotes the number of successes in n independent and identicaltrials of an experiment with probability of success . Obtain an unbiased estimator of2 and (1

    ), 0 < < 1.

    Let Xi b(1, ), i = 1, 2, , n , then T = ni=1 Xi b(n, ) . If g(T) isthe unbiased estimator of () = (1 ) , then E[g(T)] = (1 )

    nt=0

    g(t)cnt t(1 )nt = (1 )

    nt=0

    g(t)cnt

    1 t

    = (1 )1n

    Consider =

    1 =

    1 +

    ...

    n

    t=0

    g(t)cnt t =

    1 + 1

    1 + 1n

    = (1 + )n2

    = [1 + cn21 + cn32

    2 + + n]Equating the coefficient oft on both sides

    g(t)cnt = cn2t1

    g(t) =(n 2)!

    (t 1)!(n t 1)!t!(n t)!

    n!

    =(n 2)!t(t 1)!(n t)(n t 1)!(t 1)!n(n 1)(n 2)!(n t 1)!

    =t(n t)

    n(n 1) , if n = 2, 3,

    Thus the unbiased estimator of (1 ) isT(n T)n(n 1) n = 2, 3,

    42

    A. Santhakumaran

  • 8/6/2019 Probability Models and Their Parametric Estimation

    43/237

    Let the unbiased estimator of 2 be given by

    E[g(T)] = 2

    nt=0

    g(t)cnt

    1 t

    (1 )n = 2n

    t=0

    g(t)cnt t = 2(1 + )n2

    = 2[1 + cn21 + + cn2t t + + n2] g(t)cnt = c

    n2t2

    g(t) = (n 2)!t!(n t)!(t 2)!(n t)!n!

    =(n 2)!t(t 1)!(t 2)!(t 2)!n(n 1)(n 2)!

    =

    t(t

    1)

    n(n 1) n = 2, 3,

    Thus the unbiased estimator of 2 is

    g(T) =T[T 1]n(n 1) n = 2, 3,

    Example 2.10 Obtain an unbiased estimator of 1 , given a sample observation from

    a Geometric population with pmf

    p(x) =

    (1 )x1 x = 1, 2, 3, , 0 < < 10 otherwise

    E[g(X)] =1

    x=1

    g(x)(1 )x1 = 1

    x=1

    g(x)(1 )x = (1 )2

    Take 1 = = 1

    x=1

    g(x)x = (1 )2

    = (1 + 2 + 32 +

    + xx1 +

    )

    g(x) = x x = 1, 2, 3,

    Thus g(X) = X is the unbiased estimator of 1 .

    43

    Probability Models and their Parametric Estimation

  • 8/6/2019 Probability Models and Their Parametric Estimation

    44/237

    Unbiased estimator is not exist

    Example 2.11 Assume X b(1, ), 0 < < 1. If a single observation x of Xfrom a Bernoulli population, then there is no unbiased estimator exist for 2 .

    p(x) =

    (1 )1x x = 0, 1 and 0 < < 10 otherwise

    Let there be an unbiased estimator for 2 say g(X) . That is,

    E[g(X)] = 2

    1x=0

    g(x)x(1 )1x = 2

    g(0)(1 ) + g(1) = 2

    [g(1) g(0)] + g(0) = 2 g(1) = 0 and g(0) = 0 i.e., g(x) = 0 for x = 0, 1.Thus the value of 2 is 0 for x = 0 or x = 1 . But the value of 2 lies between 0 to1. The unbiased estimator of 2 does not exist.

    Example 2.12 If X b(n, ) , then show that there exist no unbiased estimatorof the parameter 1

    Consider E[g(X)] =1

    n

    i=0

    g(x)n!

    x!(n x)!

    1 x

    (1 )n = 1

    ni=0

    g(x)n!

    x!(n x)! x =

    (1 + )n+1

    where = 1g(x) n!x!(nx)!

    x g(0) as 0 and (1+)n+1 as 0 or 0Thus there is no unbiased estimator exist of the parameter 1 .

    Unbiased estimator is unique

    Example 2.13 A random sample X is drawn from a Bernoulli population b(1, ), ={ 14 , 12} . Then there exists an unique unbiased estimator of 2 .

    Let E[g(X)] = 2

    1x=0

    g(x)x

    (1 )1

    x

    = 2

    When =1

    4 3g(0) + g(1) = 1

    4(2.1)

    44

    A. Santhakumaran

  • 8/6/2019 Probability Models and Their Parametric Estimation

    45/237

    When =1

    2 g(0) + g(1) = 1

    2(2.2)

    Solving the equations (2.1) and (2.2) for g(0) and g(1) , one gets the values of g(0) =

    1

    8

    and g(1) = 5

    8

    ,

    i.e., g(x) =

    18 for x = 058 for x = 1

    Thus the unbiased estimator of 2 is g(X) = X which is unique.

    Unbiased estimator is not unique

    Example 2.14 Let X1, X2, , Xn be a iid random sample drawn from a popu-lation with Poisson distribution P() . g1(X) = X and g2(X) =

    1n

    ni=1(Xi X)2

    are the two unbiased estimators of . Consider a statistic g(X) = g1(X) + (1 )g2(X), , 0 < < 1 . Then E[g(X)] = and which is notunique. Thus unbiased estimator is not unique.

    Example 2.15 Show that the mean X of a random sample of size n drawn from

    a population with probability density function

    p(x) =

    1

    ex 0 < x < , > 0

    0 otherwise

    is an unbiased estimator of and has variance 2

    n .

    Let T =n

    i=1 Xi G(n, ). The pdf of T is

    p(t) =

    1

    nn e t tn1 0 < t < , > 0

    0 otherwise

    E[T] =

    0

    1

    nne

    1 ttn+11dt

    = n

    E

    n

    i=1

    Xi

    = n > 0

    E[nX] = n > 0 E[X] = > 0

    E[T2] = n(n + 1)2 > 0

    V[T] = n2 > 0

    ... V[X] = V

    ni=1 Xi

    n

    =1

    n2V[T]

    =1

    n2n2 =

    2

    n

    45

    Probability Models and their Parametric Estimation

  • 8/6/2019 Probability Models and Their Parametric Estimation

    46/237

    Example 2.16 Let X1, X2, , Xn be a random sample drawn from a normal pop-ulation with mean zero and variance 2, 0 < 2 < . Show that

    ni=1 X

    2i

    n is an

    unbiased estimator of 2 and has variance 24

    n .

    Define ns2 =n

    i=1 X2i , then Y = ns2

    2 2 distribution with n degreesof freedom , i.e., Y G( n2 , 12 ).

    p(y) =

    1

    2n2 n2

    e12 yy

    n2 1 0 < y <

    0 otherwise

    E[Y] =

    0

    1

    212 n2

    e12 yy

    n2 +11dy

    =1

    2n2 n2

    ( n2 + 1)

    ( 12 )n2 +1

    = n

    E[Y2] = n2 + 2nV[Y] = 2n

    But Y =ns2

    2

    ... E2

    ns2

    2

    = n

    E2 [s2] = 2

    Thus

    X2in is an unbiased estimator of

    2 .

    V2 ns2

    2 = 2nn2

    4V2 [s

    2] = 2n

    V2 [s2] =

    24

    n

    Example 2.17 Let Y1 < Y2 < Y3 be the order statistics of a random sample ofsize 3 drawn from an uniform population with pdf

    p(x) =

    1 0 < x < 0 otherwise

    Show that 4Y1 and 2Y2 are unbiased estimators of . Also find the variance of theseestimators.

    The pdf of Y1 is

    p(y1) =

    3!

    1!2!1

    y1

    1 dx

    20 < y1 <

    0 otherwise

    46

    A. Santhakumaran

  • 8/6/2019 Probability Models and Their Parametric Estimation

    47/237

    p(y1) =

    3 [1 y1 ]2 0 < y1 < 0 otherwise

    E[Y1] = 3

    0y1(1 y1

    )2dy1

    =3

    10

    t(1 t)2dt where y1 = t

    = 3

    10

    t21(1 t)31dt

    = 323

    5=

    4 > 0

    Similarly E[Y2

    1 ] =2

    10and V[Y1] =

    32

    15

    ... V[4Y1] =32

    5

    The pdf of Y2 is

    p(y2) =3!

    1!1!1!

    y20

    1

    dx

    1

    y2

    1

    dx

    p(y2) =

    6

    2 y2[1 y2 ] 0 < y2 < 0 otherwise

    .. E[Y2] =2

    2Y2 is an unbiased estimator of and E[Y2] = 3210 and V[Y2] = 2

    20

    V[2Y2] = 25Example 2.18 Let Y1 and Y2 be two independent and unbiased estimators of .

    If the variance of Y1 is twice the variance of Y2 , find the constant k1 and k2 so thatk1Y1 + k2Y2 is an unbiased estimator of with smaller possible variance for such alinear combination.

    Given E[Y1] = and E[Y2] = and V[Y1] = 22 and

    47

    Probability Models and their Parametric Estimation

  • 8/6/2019 Probability Models and Their Parametric Estimation

    48/237

    V[Y2] = 2 . Also E[k1Y1 + k2Y2] =

    k1E[Y1] + k2E[Y2] =

    k1 + k2 = 1

    i.e., k2 = 1 k1Consider = V[k1Y1 + k2Y2]

    = k21 V[Y1] + k22V[Y2]

    = k21 22 + (1 k1)22

    = 3k212 2k12 + 2

    Differentiate twice this with respective to k1d

    dk1= 6k1

    2 22

    d2

    dk21= 62

    For minimum ddk1

    = 0 and d2

    dk21

    > 0

    6k12 22 = 0i.e., k1 =

    1

    3and k2 =

    2

    3

    Thus 13 Y1 +23 Y2 has minimum variance.

    Consistent estimator need not be unbiased

    Example 2.19 Let X1, X2, , Xn be a sample of size n drawn from a normalpopulation with mean and variance 2 . Define s2 = 1n

    ni=1(Xi X)2 , then

    Y = ns2

    2 2 distribution with (n 1) degrees of freedom and Y G( n12 , 12 ) .It has the pdf

    p(y) =

    1

    2n1

    2 n12e

    12 yy

    n12 1 0 < y <

    0 otherwise

    48

    A. Santhakumaran

  • 8/6/2019 Probability Models and Their Parametric Estimation

    49/237

    E[Yr] =

    0

    1

    2n1

    2 n12e

    12 yy

    n12 +r1dy

    =1

    2n

    12 n12

    n1

    2 + r( 12 )

    n

    12 +r

    =2r

    n12

    n 1

    2+ r

    When r = 1

    E[Y] =2

    n12

    n 12

    n 1

    2= n 1

    ... E2

    ns2

    2

    = n 1

    E2 [s2] = n 1n 2

    and V2 [s2] =

    2(n 1)n2

    4

    Thus E2 [s2] 2 and V2 [s2] 0 as n

    .. 1n

    ni=1(Xi X)2 is a consistent estimator of 2 .

    But E2 [s2] = 2. .. 1n

    ni=1(Xi X)2 is not an unbiased estimator of 2 .

    Example 2.20 Illustrate with an example that an estimator is both consistent and

    unbiased.

    Let X1, X2, , Xn be a random sample of size n drawn from a normalpopulation with mean and variance 2 . Define s2 = 1n

    ni=1(Xi X)2 and

    S2 = 1n1n

    i=1(Xi X)2 , then Y = ns2

    2 2 distribution with (n 1) degreesof freedom and Y G( n12 , 12 ) . with E2 [s2] = n1n 2 and V2 [s2] = 2(n1)n2 4 .

    (n 1)S2

    = ns

    2

    S2

    =

    n

    n 1 s2

    E2 [S2] =

    n

    n 1 E2 [s2]

    =n

    n 1n 1

    n2 = 2

    V2 [S2] =

    n2

    (n 1)2 E2 [s2]

    =n2

    (n 1)22(n 1)

    n24

    =24

    (n 1) 0 as

    Thus S2 = 1n1n

    i=1(Xi X)2 is consistent and also unbiased estimator of 2 .Example 2.21 Give an example that an unbiased estimator need not be consistent.

    Let X1, X2, , Xn be a random sample drawn from a normal populationwith mean and known variance 2 , then the estimator X1 ( first observation) of the

    49

    Probability Models and their Parametric Estimation

  • 8/6/2019 Probability Models and Their Parametric Estimation

    50/237

    sample is unbiased but not consistent. Since E[X1] = and V[X1] = 2

    and

    P

    {|X1

    |<

    }= P

    { < X1

    <

    }= P{ < X1 < + }

    =12

    +

    e1

    22(x1)2 dx1

    1 as n

    . . X1 is not consistent but unbiased estimator of .Example 2.22 Give an example that an estimator is not consistent and not unbi-

    ased.

    Let Y1 < Y2 < Y3 be the order statistics of a random sample of size 3 drawnfrom a uniform population with pdf for given is

    p(x) = 1 0 < x < 0 otherwise

    then Y1 is not consistent and not unbiased estimator of , since

    E[Y1] =

    4= and

    P

    Y1 4 <

    = P

    4 < Y1 <

    4+

    =3

    4 +

    4

    1 y1

    2dy1

    1 as n

    Thus Y1 the first order statistic is not consistent and not unbiased estimator of .

    2.9 Sufficient Statistic

    Sufficient statistic conveys as much as information about the distribution of a ran-

    dom variable which is contained in the sample. It helps to identify a family of distribu-

    tions only and not for the parameters of the distributions.

    Definition 2.1 Let X1, X2, , Xn be a random sample of size n drawn from apopulation with pdf p(x | ). Let T = t(X) be a statistic whose pdf is p(t) . Fora continuous random variable X, T = t(X) is said to be a sufficient statistic iff

    p(x1, x2, , xn)p(t)

    is independent of for every given T = t . Similarly for a discrete random variableX , T = t(X) is said to be a sufficient statistic iff

    P {X1 = x1, X2 = x2, | T = t}

    50

    A. Santhakumaran

  • 8/6/2019 Probability Models and Their Parametric Estimation

    51/237

    is independent of for every given T = t .Example 2.23 Let X be a single observation from a population with pmf

    p(x), 0 < < 1 .

    p(x) =

    |x|(1)|

    x|

    2 x = 1, 11 (1 ) x = 00 otherwise

    Show that |X| is sufficient.Let Y = |X| . Then P{Y = 0} = P{|X| = 0} = P{X = 0} = 1 (1 )

    P{Y = 1} = P{|X| = 1} = P{X = 1orX = 1} = P{X = 1} + P{X = 1} =(1 )Consider

    P{X = 1 | Y = 1} = P{X = 1 Y = 1}P{Y = 1}

    =X = 1 |X| = 1}

    P{Y = 1}=

    P{X = 1}P{Y = 1}

    =(1)

    2

    (1 ) =1

    2is independent of

    Therefore Y = |X| is sufficient.Example 2.24 Let X1, X2, , Xn be independent random sample drawn from apopulation with pdf

    p(x) =

    eix x > i, i = 1, 2, 3 , n0 otherwise

    Show that T = min1inXi

    i is a sufficient statistic.Let y = xii , then dx = idy

    Given p(x) = ei[xi ]

    i.e., p(y) = iei[y], y >

    Take T = min1in Yi . The pdf of T is

    p(t) =n!

    1!(n 1)! iei[t]

    t

    ieiiydyn1

    = ineninit < t < p(x1, x2, , xn)

    p(t)=

    ein

    xi

    nieninit

    =1

    ni

    enitxi

    It is independent of . Thus T = min1in Yi = min1in Xii is sufficient.Example 2.25 Let X1 and X2 be iid Poisson random variables with parameter

    . Prove that

    51

    Probability Models and their Parametric Estimation

  • 8/6/2019 Probability Models and Their Parametric Estimation

    52/237

    (i) X1 + X2 is a sufficient statistic.

    (ii) X1 + 2X2 is not a sufficient statistic.

    (i) Given that

    P{X1 = x1} =

    e x1x1!

    x1 = 0, 1, 2, 0 otherwise

    and P{X2 = x2} =

    e x2x2!

    x2 = 0, 1, 2, 0 otherwise

    Let T = X1 + X2 , then

    P{T = t} =

    e tt! t = 0, 1, 2,

    0 otherwise

    Consider P{X1 = x1, X2 = x2 | T = t} =P

    {X1 = x1, X2 = t

    x1

    }P{T = t}=

    P{X1 = x1}P{X2 = t x1}P{T = t}

    =

    e x1x1!

    e tx2(tx2)!

    e2(2)tt!

    =t!

    (t x1)!x1!2t is independent of.

    .. X1 + X2 is a sufficient statistic.

    (ii) Consider P{X1 + 2X2 = 2} = P{X1 = 0, X2 = 1}

    + P{X1 = 2, X2 = 0}= P{X1 = 0}P{X2 = 1}+ P{X1 = 2}P{X2 = 0}= e2 +

    2

    2e2

    = e2[1 +

    2]

    Therefore P{X1 = 0, X2 = 1 | X1 + 2X2 = 2} = P{X1 = 0, X2 = 1}P{X1 + 2X2 = 2}

    =e2

    e2[1 + 2 ]

    =2

    2 + depends on .

    .. X1 + 2X2 is not a sufficient statistic.Example 2.26 Let X1 and X2 be two independent Bernoulli random variables such

    52

    A. Santhakumaran

  • 8/6/2019 Probability Models and Their Parametric Estimation

    53/237

    that P{X1 = 1} = 1 P{X1 = 0} = , 0 < < 1 and P{X2 = 1} =1 P{X2 = 0} = 2, 0 < 12 . Show that X1 + X2 is not a sufficient statistic.

    Let T = X1 + X2. Consider

    P{T = 1} = P{X1 + X2 = 1}= P{X1 = 0, X2 = 1} + P{X1 = 1, X2 = 0}= (1 )2 + (1 2)= (3 4)

    ..P{X1 = 0 | X1 + X2 = 1} = P{X1 = 0 X1 + X2 = 1}P{X1 + X2 = 1}

    =P{X1 = 0, X2 = 1}

    P{X1 + X2 = 1}=

    (1 )2(3 4)

    =2(1

    )

    (3 4) is dependent on .

    . . X1 + X2 is not a sufficient statistic.Example 2.27 If X1 and X2 denote a random sample drawn from a normal popula-tion N( , 1 ), < < . Show that T = X1 + X2 is a sufficient statistic.

    The joint pdf of X1 and X2 is

    p(x1, x2) = p(x1)p(x2)

    =1

    2e

    12 (x1)2 12 (x2)2

    Let T = X1 + X2 N(2, 2)

    p(t) =

    122 e1

    4 (t2)2

    < t < 0 otherwise

    The definition of sufficient statistic gives

    p(x1, x2)

    p(t)=

    12 e

    12 [x21+x222(x1+x2)+22]

    12

    e

    14 [t

    24t+42]

    =1

    e12 (x

    21+x

    22)+(x1+x2)2

    e14 (x1+x2)

    2+(x1+x2)2

    =1

    e12 (x

    21+x

    22)+

    14 (x1+x2)

    2

    is independent of.

    . . T = X1 + X2 is a sufficient statistic.

    Example 2.28 Let X1, X2, X3 be a sample from B(1, ) . Show that X1X2 +X3is not sufficient.

    53

    Probability Models and their Parametric Estimation

  • 8/6/2019 Probability Models and Their Parametric Estimation

    54/237

    Let Y = X1X2 and T = X1X2 + X3 , then

    P{Y = 0} = P{X1 = 0X2 = 0} + P{X1 = 1, X2 = 0} + P{X1 = 0, X2 = 1}= (1

    )2 + (1

    ) + (1

    )

    = 1 2

    P{Y = 1} = P{X1 = 1, X2 = 1}= 2

    P{Y + X3 = 1} = P{Y = 0, X3 = 1} + P{Y = 1, X3 = 0}= (1 2) + 2(1 )

    i.e., P{T = 1} = (1 )(1 + 2)Consider

    P{Y = 1 | T = 1} = P{Y = 1, T = 1}P{T = 1}

    =P{Y = 1}P{X3 = 0}

    P{T = 1}=

    2

    (1 )(1 + 2)=