Bayesian Methods of Estimation

Embed Size (px)

Citation preview

  • 8/2/2019 Bayesian Methods of Estimation

    1/12

    MAKERERE UNIVESITY

    COLLEGE OF ENGINEERING DESIGN ART AND TECHNOLOGY

    DEPARTMENT OF CIVIL ENGINEERING

    Math assignment

    Group members

    NAMES REGISTRATION

    NUMBER

    STUDENT NUMBER

    OLARA ALLAN 10/U/683 210001123

    ARIKOD RICHARD 10/U/657 210001135

    MUKIIZA JULIUS 10/U/671 210001151

    NDARAMA MICHEAL

    SIMON

    10/X/3007/PSA 210004611

    TWEHEYO DISHAN 10/U/690 210001016

    BULUMA MELINDA 10/U/662 210000809

    NAMIYA MARIAM 10/U/676 210000345

    SSONKO EMMANUEL 10/U/9979/PSA 210005498

    BUYINZA ABBEY 10/U/663 210000879

    ARIKE PATRICK 10/U/9989/PSA 210018734

    TUKAMUSHABA EMMY 08/U/3053/PSA 208006302

    ANGURA GABRIEL 10/U/9946/PSA 210018907

    SEMAHORO ALLAN 10/U/9945/PSA 210006460

    NANKABIRWA ROSE 10/U/678 210000348

    MUTONGOLE SAMUEL 10/U/9965/PSA 210009531

    NAMPEERA ROBINAH 10/U/677 210001032

    KIGONYA ALLAN 10/U/668 210000683

    OCAN GEOFREY 10/U/9971/PSA 210017525

    MUTYOGOMA MAUYA 10/U/9998/PS 210006589

    DRATELE SIGFRIED BUDRA 10/U/1914 210001946

    MUKIIBI SSEMAKULA

    PETER

    10/U/9964/PSA 210006993

    KINENE SERWANGA BRIAN 10/U/687 210001541

    OLUKA PATRICK 10/U/1002/PS 210006598

  • 8/2/2019 Bayesian Methods of Estimation

    2/12

    Group 5 math assignment Page 1

    BAYESIAN ESTIMATION OF DISTRIBUTION PARAMETERS

    IntroductionBayes theorem is a theorem with two distinct interpretations. In the Bayesian interpretation, it

    expresses how a subjective degree of belief should rationally change to account for evidence. Infrequents interpretation, it relates inverse representations of the probabilities concerning two

    events Bayesian statistics has applications in fields including science, engineering, medicine and

    law.

    Basics of the Bayesian estimation methodConsider the problem of finding a point estimate of the parameter for the population f(x: ).

    The classical approach would be take a random sample of size n and substitute the information

    provided by the sample into the appropriate estimator or decision function.

    For the case of binomial population b(x:n,p) the estimate ofp, the proportion of success, wouldbe .Suppose that additional information is given about parameter , namely, that is known to vary

    according to some to probability distribution f(), often called prior distribution, with prior

    mean o and prior variance o2. That is we are now assuming to be a value of a randomvariable with probability distribution f() and we wish to estimate the particular value of for

    the population from which we selected our sample.

    The probabilities associated with this prior distribution are called subjective probabilities, in

    that they measure a persons degree of beliefin the location of the parameter.

    Bayesian techniques use the prior distribution f() along with joint distribution the sample to compute the posterior distribution The posterior distribution consists of information from both subjective-prior distribution and the

    objective sample distribution and express the degree of belief in the location of parameter afteryou have observed the sample.

    If we denote by the joint probability distribution of the sample,conditional on the parameter in a situation where is a random variable; The joint distribution of the sample and then the parameter is then = f()

    http://en.wikipedia.org/wiki/Sciencehttp://en.wikipedia.org/wiki/Engineeringhttp://en.wikipedia.org/wiki/Medicinehttp://en.wikipedia.org/wiki/Lawhttp://en.wikipedia.org/wiki/Lawhttp://en.wikipedia.org/wiki/Medicinehttp://en.wikipedia.org/wiki/Engineeringhttp://en.wikipedia.org/wiki/Science
  • 8/2/2019 Bayesian Methods of Estimation

    3/12

    Group 5 math assignment Page 2

    From which we readily obtain the marginal distribution

    {

    Hence the posterior distribution may be written

    Note; the mean of the posterior distribution

    denoted by *, is

    called Bayes estimate of .

    The density is called the posterior density.Consider the Bayes estimation of the probability p of an event where p is a realization of a

    random variable X with probability density function whose range is .Prior estimates of p can b obtained from;

    = . (1)To improve on the estimate of p, we conduct an experiment of tossing a die n times andobserving the number of aces to be k.

    Applying Bayes theorem, the posterior density is written as;

    .. (2)Where B = }From binomial probability, we obtain

    () . (3)Substitute equation (3) into (2) () () . (4)

  • 8/2/2019 Bayesian Methods of Estimation

    4/12

    Group 5 math assignment Page 3

    The updated estimate of p can be obtained by substituting from equation (4) for in(1).

    = . (5)Assuming that is uniformly distributed in , instead of a general distribution in therange , equation (4) can be simplified.The integral ................ (6) can be shown to be true usingmathematical induction.

    Substituting = 1 and from equation (6), we can evaluate as () . (7)We can express the conditional density as follows; .. (8)The posterior estimate for p is obtained from equation (5) as

    = And from equation (6), , is given by = Theorem

    Bayesian methods of estimation concerning the mean of normal population are based on the

    following theorem.

    If is the mean of the random sample of size n from a normal population with known thevariance 2 and the prior distribution of the population mean is a normal distribution with priormean o and prior variance

    o

    2, then the posterior distribution of the population mean is also

    normal distribution with mean* and standard deviation *, where and The posterior mean * is the Bayes estimate of the population mean and 100(1-)% Bayesian

    interval for can be constructed by computing the interval

  • 8/2/2019 Bayesian Methods of Estimation

    5/12

    Group 5 math assignment Page 4

    , which is centered at the posterior mean and contains 100(1-)% of the posterior probability.

    Example

    An electrical firm manufactures light bulbs that have a length of life that is approximatelynormally distributed with a standard deviation of 100hours. Prior experience leads us to believe

    that is a value of a normal random variable with a mean o = 800 hours and standard deviation

    o = 10hours. If a random sample of 25 bulbs has an average life of 780 hours, find a 95%

    Bayesian interval for .

    Solution

    The posterior distribution of the mean is also normally distributed with the mean

    * = = 796and standard deviation

    * = The 95% Bayesian interval for is then given by;

    796 - 1.96 < < 796 + 1.96 778.5 < < 813.5By ignoring the prior information about in the example above, one can continue to constructthe classical 95% confidence interval

    780-(1.96)

  • 8/2/2019 Bayesian Methods of Estimation

    6/12

    Group 5 math assignment Page 5

    Disadvantages of Bayesian Estimation

    i. You can get very different posterior distributions by changing what parameters haveuninformative priors. In other words, there are some tricky mechanical issues.

    ii. The frequentist-based framework is ideal for the Popperian view of science because it allowsyou to falsify hypothesis. Under Bayesian statistics, there is no such thing as falsification, justrelative degrees of belief.

    iii. Frequentist statistics is "easy" and has accepted conventions of method and notation. The samecant be said of Bayesian statistics. Bayesian requires understanding probability and likelihood.

  • 8/2/2019 Bayesian Methods of Estimation

    7/12

    Group 5 math assignment Page 6

    VECTOR RANDOM VARIABLES

    A random matrix (or random vector) is a matrix (vector) whose elements are random variables.

    Its elements jointly distributed. Two random matrices X1 and X2are independent if the elementsof X1(as a collection of random variables) are independent of the elements of X2 but the elementswithin X1or X2 do not have to be independent. Similarly, a collection of random matrices X1, . . .

    ,Xkis independent if their respective collections of random elements are (mutually) independent.

    (Again, the elements within any of the random matrices need not be independent.)Similarly, an infinite collection of random matrices is independent if every finite sub-collection

    is independent.

    Expectation (mean) of a random matrix

    The expected value or mean of an m n random matrix X is the m n matrix E(X) whoseelements are the expected values of the corresponding elements of X, assuming that they all

    exist. That is if;

    X= Then

    E(X)= Properties:

    E (X) = E(X)

    If X is square, E (tr(X)) = tr (E(X)) If a is a constant, E (aX) = a E(X) E (vec(X)) = vec (E(X))If A and B are constant matrices E(AXB) = AE(X) B E(X1 +X2) = E(X1)+E(X2) If X1 and X2 are independent, E(X1X2) = E(X1) E(X2)

    Covariance

    This is the relationship between two random variables. If 3 or more random variables are jointly

    distributed, one must consider the covariance for all possible pairs.

    The covariance of 3 jointly distributed random variables x, y and z is specifically the 3

    covariances;xy for x and y, yz for y and z and xz for x and z. Thus, in dealing with mjointly

    distributed random variables, it is convenient to collect them into a single vector. A random

  • 8/2/2019 Bayesian Methods of Estimation

    8/12

    Group 5 math assignment Page 7

    vector is one whose components are jointly distributed random variables. Therefore, ifx1, x2,...,

    xm arem jointly distributed random variables , the vector,

    is a random vector;

    where 1, 2,. m are mean values ofx1, x2,..., xm respectively, then

    ( =

    = Noting that E [ = E( ( ) (the covariance of and ) and if , we obtain the symmetricmatrix below;

    Note; The variance of the individual random variables form the main diagonal of xxxx is the variance-covariance matrix of XIf the random variables in X are uncorrelated, all covariance (off diagonal) elements of xx arezero and the matrix is diagonal. The relationship between the weight matrix W and the

    corresponding variance matrix/covariance matrix, with subscripts added to indicate reference to

    random vector X, is restated as;

    Wxx = 02xx-1

    02 is the reference variance.

    Caution; If Wxx is non diagonal the simple weights calculated in;

  • 8/2/2019 Bayesian Methods of Estimation

    9/12

    Group 5 math assignment Page 8

    W1 = 02/ 1

    2

    W2 = 02/ 0

    2. (4-16)

    Wm = 02/ m

    2,

    are not to be used as diagonal elements of Wxx , but only when Wxx is

    diagonal are the weights calculated in (4-16) identical to the diagonal elements.

    Example 1

    Two observations are represented by the random vector;

    X =

    The variances X1 and X2are 12

    and 22

    respectively. The covariance of X1 and X2is 12, and the

    correlation coefficient is 12.

    (a)For a selected reference variance 02, derive the weight matrix of X in terms of the givenparameters.

    (b)Show that the weights calculated in (4-16) are identical to the diagonal elements of theweight matrix only when 12 = 0

    Solution;

    (a)The weight of the matrix X is;Wxx= 0

    2xx-1= 02 =/ ( 1222 - ) we know that 12 = 1212 and 1

    22

    2- 12 = 1

    22

    2(1-)

    thus; Wxx= 02/( 1

    22

    2(1 - 12

    2)

    =

    (b) from 4-16

    W1 = and W2=

  • 8/2/2019 Bayesian Methods of Estimation

    10/12

    Group 5 math assignment Page 9

    The diagonal elements of Wxx areand

    only when = 0When =/ 0, the weights W1 and W2 cannot be used as diagonal elements of Wxx . Each of the

    xx can be divided by

    to yield a scaled version of

    called Qxx( co-factor of matrix of X)

    Qxx = xx =[

    ]

    xx = QxxQxx is also called the relative co=variance matrix.

    The variance-covariance matrix (or covariance matrix) of an m random vector x is the m matrix V(x)(or Var(x) or cov(x)) defined byV(X)=E((x-E(X))(X-E(x))) when the expectations all exist

    Also if

    And, in particular, V(x) is symmetrical and diagonal if the elements of x are independent.

    Properties

    If a is constant, If A is a constant matrix and b a constant vector, ( is always non

    negative definite)

    The covariance between the

    random variables x1 and the

    random variable vector x2 is

    defined to be matrix. , when all expectations exist; if a and b are constants. If A and B area constant matrices and c and d are constant vectors;

  • 8/2/2019 Bayesian Methods of Estimation

    11/12

    Group 5 math assignment Page 10

    and

    Conditional expectationThe conditional expectation of two random matrices X1and X2; E(X1/X2) (of X1 given X2=A) is

    the expectation of X1defined using the conditional distribution of its elements given X2=A (A

    being a constant matrix). The conditional expectation E(X1/X2), is the expectation of X1 defined

    using the conditional distribution of its elements given X2.

    The double expectation formula is E (E(X1/X2))=E(X1)

    The conditional variance-covariance matrix V(x1/X2=A) or V(x1/X2) for a random vector x1 is

    defined by replacing conditional expectations into the definition of the variance-covariance

    matrix appropriately.

    The conditional covariance formula applies:

    V(x1) = E(V(x1/X2) + V(E(x1/X2))

    For random vectors x1 and x2, the conditional covariance Cov(x1,x2/x3=A) or cov(x1,x2/x3) can be

    defined by putting the appropriate conditional expectations into the definition of the covariance.

    An additional covariance formula is cov(x1x2)= E(cov(x1,x2/X3))+cov(E(x1/X3),E(x2/X3)

  • 8/2/2019 Bayesian Methods of Estimation

    12/12

    Group 5 math assignment Page 11

    REFERENCES

    Probability and statistics for engineers and scientists, 6th edition by Walpole. Myers, MyersRonald E Walpole, Raymond H.Myers, Sharon L.Myers Page 275-280.

    Analysis and adjustment of survey measurements by Edward M. Mikhai, School ofEngineering, Purdue University West Lafayette, India.

    Probability and random processes by Venkatarama Krishna 2006 published by John Wileyand Sons, pages 384-405.

    Amos Storkey. Mlpr lectures: Distributions and models.http://www.inf.ed.ac.uk/teaching/courses/mlpr/lectures/distnsandmodels-

    print4up.pdf, 2009. School ofInformatics, University ofEdinburgh.

    http://www.inf.ed.ac.uk/teaching/courses/mlpr/lectures/distnsandmodels-http://www.inf.ed.ac.uk/teaching/courses/mlpr/lectures/distnsandmodels-http://www.inf.ed.ac.uk/teaching/courses/mlpr/lectures/distnsandmodels-