2a-prob.pdf

Embed Size (px)

Citation preview

  • Economics 420Introduction to EconometricsProfessor Woodbury

    Fall Semester 2015

    Experiments and Basic Statistics

    1. The probability framework for statistical inference2. Estimation

    3. Hypothesis testing

    4. Confidence intervals

    1

  • 1. The probability framework for statistical inference

    a. Population, random variable, distribution

    b. Moments of a distribution mean, variance, standard deviation

    c. Two random variables joint distributions, covariance, correlation

    d. Conditional distributions and conditional means

    e. Distribution of a sample of data drawn randomly from a population

    Y1, , Yn

    2

  • a. Population, random variable, distribution

    Population

    The group or collection of all possible entities of interest (for example, school districts)

    We will think of populations as infinitely large ( is an approximation to very big)

    Random variable Y

    Numerical summary of a random outcome (district average test score, district STR) may be discrete (finite number of outcomes, like the roll of a die) or continuous (takes any real

    value with probability 0)

    3

  • Population distribution (density function) of Y

    The probabilities of different values of Y that occur in the population, for example Pr[Y = 650] (when Y is discrete)

    OR

    The probabilities of sets of these values, for example,Pr[640 Y 660] (when Y is continuous).

    Note: We will be using mainly discrete random variables think

    of a histogram

    4

  • For example, for a discrete random variable Y, the probability

    distribution can be written:

    ! pj = Pr(Y = yj), for j = 1, 2, 3, ..., k

    A probability density function summarizes what we know about

    the outcomes of Y:

    ! f(yj) = pj , for j = 1, 2, 3, ..., k

    For example, for the roll of a die, this would be:

    f(1) = 1/6! ! f(2) = 1/6! ! f(3) = 1/6

    f(4) = 1/6! ! f(5) = 1/6! ! f(6) = 1/6

    5

  • Or if you want to to be better organized:

    Outcome Y=1 Y=2 Y=3 Y=4 Y=5 Y=6

    Probability

    f(yi)0.167 0.167 0.167 0.167 0.167 0.167

    6

  • Cumulative distribution function

    The cumulative probability distribution is the probability that the

    random variable is less than or equal to a particular value:

    ! F(y) = Pr(Y y)Outcome Y1 Y2 Y3 Y4 Y5 Y6Cumulative

    probability

    F(yi)

    0.167 0.333 0.50 0.667 0.833 1.0

    The cumulative probability is more useful with continuous

    distributions than with discrete distributions, but the idea is the

    same (more later)

    7

  • b. Moments of a population distribution mean,

    variance, standard deviation

    mean = expected value (expectation) of Y

    = E(Y)

    = Y

    You can think of this as the long-run average value of Y over

    repeated realizations of Y

    8

  • For a discrete random variable (one that takes on a finite or

    discrete set of values, like the outcome of rolling a die 1, 2, 3, 4,

    5, 6):

    Copyright 2003 by Pearson Education, Inc. 2-3

    Key Concept 2.1

    9

  • Moments (continued)

    variance = E(Y Y)2

    = Y2

    = measure of the squared spread of the distribution

    standard deviation = = Y= measure of the spread of a distribution

    10

  • Again for a discrete random variable:

    Copyright 2003 by Pearson Education, Inc. 2-4

    Key Concept 2.2

    11

  • Exercise

    Let Y be the number of heads that occur when two coins are

    tossed.

    a. What is the probability density function of Y?

    b. What is the cumulative distribution of Y?

    c. What are the mean, variance, and standard deviation of Y?

    12

  • a. What is the probability density function of Y?

    Outcome (number

    of heads)Y = 0 Y = 1 Y = 2

    Probability

    f(yi)0.25 0.50 0.25

    b. What is the cumulative distribution of Y?

    Outcome (number

    of heads)Y 0 Y 1 Y 2

    Cumulative

    probability F(yi)0.25 0.75 1.0

    13

  • c. What are the mean, variance, and standard deviation of Y?

    Y = E(Y) = (00.25) + (10.50) + (20.25)=1.00

    var(Y) = [(01)2 0.25] + [(11)2 0.50] + [(21)2 0.25]

    ! = (1 0.25) + (0 0.50) + (1 0.25) = 0.50

    Y = var(Y) = 0.50 = 0.707

    14

  • Notes on continuous distributions726 APPENDICES

    When computing probabilities for continuous random variables, it is easiest to work with the cumulative distribution function (cdf ). If X is any random variable, then its cdf is defined for any real number x by

    F(x) P(X x). [B.6]

    For discrete random variables, (B.6) is obtained by summing the pdf over all values xj such that xj x. For a continuous random variable, F(x) is the area under the pdf, f, to the left of the point x. Because F(x) is simply a probability, it is always between 0 and 1. Fur-ther, if x1 x2, then P(X x1) P(X x2), that is, F(x1) F(x2). This means that a cdf is an increasing (or at least a nondecreasing) function of x.

    Two important properties of cdfs that are useful for computing probabilities are the following:

    For any number c, P(X c) 1 F(c). [B.7]

    For any numbers a b, P(a X b) F(b) F(a). [B.8]

    In our study of econometrics, we will use cdfs to compute probabilities only for continu-ous random variables, in which case it does not matter whether inequalities in probability statements are strict or not. That is, for a continuous random variable X,

    a

    f(x)

    b x

    F I G U R E B . 2 The probability that X lies between the points a and b.

    C

    enga

    ge Le

    arni

    ng, 2

    013

    15