27
Probability Theory and Statistics Peter Jochumzen April 18, 2016

Probability Theory and Statistics · 2017. 5. 15. · Chapter 1 Probability Theory And Statistics 1.1 Experiment, Outcome and Event The sample space S of an experiment is the set

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

  • Probability Theory and Statistics

    Peter Jochumzen

    April 18, 2016

  • Contents

    1 Probability Theory And Statistics 31.1 Experiment, Outcome and Event . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Rules of Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.4 Conditional Probabilities and Independent Events . . . . . . . . . . . . . . . . . . . . . . 41.5 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.6 Discrete and Continuous Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . 41.7 Probability Mass Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.8 Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.9 Cumulative Distribution Function, Discrete Random Variables . . . . . . . . . . . . . . . 61.10 Cumulative Distribution Function, Continuous Random Variable . . . . . . . . . . . . . . 61.11 Probability Density Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.12 Function of a Random Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.13 Expected Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.14 Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.15 The Constant Random Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.16 The Discrete Uniform Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.17 The Bernoulli Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.18 The Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.19 The Continuous Uniform Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.20 The Exponential Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.21 The Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.22 Two Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.23 Probability Mass Function, Two Random Variables . . . . . . . . . . . . . . . . . . . . . . 111.24 Marginal Probability Mass Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.25 Cumulative Distribution Function, Two Random Variables . . . . . . . . . . . . . . . . . . 111.26 Probability Density Function, Two Continuous Random Variables . . . . . . . . . . . . . 111.27 Marginal Probability Density Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121.28 Conditional Distributions, Discrete Random Variables . . . . . . . . . . . . . . . . . . . . 121.29 Conditional Distributions, Continuous Random Variables . . . . . . . . . . . . . . . . . . 131.30 Independent Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131.31 Conditional Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131.32 Function of Two Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131.33 Expected Value and Variance of a Linear Function of Two Random Variables . . . . . . . 141.34 Expected Value of a Product of Two Random Variables . . . . . . . . . . . . . . . . . . . 141.35 Covariance of Two Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141.36 Covariance of Two Random Variables, Results . . . . . . . . . . . . . . . . . . . . . . . . 141.37 Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151.38 Several Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151.39 Random Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161.40 Random Sample of Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161.41 Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171.42 Sample Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171.43 Sample Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171.44 Sample Covariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171.45 Properties of the Sample Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

    1

  • 1.46 Properties of the Sample Covariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181.47 The Chi-square Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181.48 Properties of the Sample Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191.49 The Standard Error of the Sample Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191.50 The t-distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191.51 The F-distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191.52 Statistical Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201.53 Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201.54 Properties of Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201.55 Standard Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211.56 The Z-statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211.57 Generalizing the Z-statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211.58 The t-statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221.59 Generalizing the t-statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221.60 Critical Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221.61 Hypothesis Testing, Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231.62 Hypothesis Tests Involving the Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241.63 Confidence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251.64 Asymptotically Unbiased Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251.65 Consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251.66 plim Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261.67 Convergence in Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261.68 Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

    2

  • Chapter 1

    Probability Theory And Statistics

    1.1 Experiment, Outcome and Event

    • The sample space S of an experiment is the set of all possible outcomes of that experiment. Thesample space S is the universal set.

    • Example. Experiment: Toss a dice. Sample space S = {1, 2, 3, 4, 5, 6}.

    • We say that the sample space is finite if it contains a finite number of elements.

    • Elements of the sample space are called outcomes or possible outcomes.

    • An event is a subset of the sample space S.

    • Example (continued). A = {1, 3, 5} is the event ”toss an odd number”.

    • The sample space is called a certain event and an empty set ∅ is called an impossible event.

    • An event containing only one outcome is called an elementary event.

    • If A,B are two events then A ∪B, A ∩B, AC (the complement of A in S) and so on are events.

    • Two events A,B are called mutually exclusive if A ∩B = ∅

    1.2 Probability

    • If A is an event then P (A) denotes the probability that the experiment will result in an outcome in A

    • Example. Toss a fair dice and let A = {1, 3, 5}, B = {4}. Then P (A) = 1/2, P (B) = 1/6.

    • If S is a finite sample space with n outcomes then we say that each outcome is equally likely ifP (A) = 1/n for each elementary event A.

    • Example. Each outcome of a fair dice is equally likely.

    • If the outcomes of an experiment are equally likely then

    P (A) =k

    n

    where A is an arbitrary event with k outcomes and n is the number of outcomes of the sample space.

    1.3 Rules of Probability

    In this section, A,B are arbitrary events in the sample space S.

    1. Probabilities are between 0 and 1:0 ≤ P (A) ≤ 1

    3

  • 2. Probabilities of certain and impossible events:

    P (S) = 1 P (∅) = 0

    3. Probabilities of mutually exclusive events: If A,B are mutually exclusive events then

    P (A ∪B) = P (A) + P (B)

    (This rule can be extended, in a trivial way, to the case when we have many mutually exclusive events)

    4. Probabilities of complements:P (AC) = 1− P (A)

    5. Probabilities of subsets: If A ⊂ B then

    P (A) ≤ P (B)

    6. Probabilities of unions:P (A ∪B) = P (A) + P (B)− P (A ∩B)

    1.4 Conditional Probabilities and Independent Events

    In this section, A,B are arbitrary events in the sample space S.

    • If P (B) > 0 we define the conditional probability of the event A given B, denoted by P (A|B), as

    P (A|B) = P (A ∩B)P (B)

    • We say that A,B are independent events if

    P (A ∩B) = P (A)P (B)

    • A,B are independent events and P (B) > 0 if and only if P (A|B) = P (A)

    1.5 Random Variables

    • Simplified definition: A random variable on a sample space S is a function or a mapping from Sto R. If the random variable is denoted by X then X : S → R.

    • If c is a constant then X = c is an event defined as the collection of outcomes that X will map to c.

    • If c is a constant then P (X = c) is the probability of the event X = c.

    • Similarly, if a, b are constants, X < a, X ≥ b, a < X ≤ b and so on are events with probabilitiesP (X < a), P (X ≥ b), P (a < X ≤ b) and so on.

    • More generally, If A is an arbitrary subset of R then X ∈ A is an event defined as the collection ofoutcomes that X will map to any number in A and P (X ∈ A) is the probability of this event.

    1.6 Discrete and Continuous Random Variables

    • If X is a random variable then we define the range of X as the collection of real numbers that X cantake. The range of X is typically denoted by R or RX .

    • A finite set is a set with a finite number of elements.

    • The set of natural numbers N = {1, 2, 3, · · · } is a countably infinite set

    • The set of integers Z = {0,±1,±2,±3, · · · } is also a countably infinite set and so is any infinite subsetof of Z

    4

  • • The set of real numbers R is an uncountably infinite set.

    • All intervals such as (a, b), [a, b], (a, b], [a, b), (−∞, b), (a,∞) where a < b are real numbers areuncountably infinite sets.

    • If the range of X is finite or countable infinite then we say that X is a discrete random variable.

    • If the range of X is uncountable infinite and P (X = c) for all real numbers c then we say that X is acontinuous random variable.

    1.7 Probability Mass Function

    In this section, X is a discrete random variable so that R, the range of X, is finite or countable infinite.

    • If the range of X is finite, we denote the range by R = {x1, x2, · · · , xn}.If the range of X is countably infinite, we denote the range by R = {x1, x2, · · · }.

    • For an arbitrary real number x, we define

    f(x) = P (X = x)

    as the probability mass function, pmf, of X.

    • If x is not in the range of X then X = x is an empty set and f(x) = 0.

    • If x is in the range of X then X = x is a non empty set and 0 ≤ f(x) ≤ 1.

    • If A is a subset of the range R then the probability that X takes a value in A is given by

    P (X ∈ A) =∑x∈A

    f(x)

    • The pmf must satisfy ∑x∈R

    f(x) = 1

    If the range is finite, the expression can also be written

    n∑i=1

    f(xi) = 1

    1.8 Cumulative Distribution Function

    In this section, X is an arbitrary random variable.

    • For an arbitrary real number x, we define

    F (x) = P (X ≤ x)

    as the cumulative distribution function, cdf of X.

    • The cdf F has the following properties:

    1. F is an increasing function.

    2. F is continuous from the right (but not continuous).

    3. limx→−∞ F (x) = 0

    4. limx→∞ F (x) = 1

    5

  • 1.9 Cumulative Distribution Function, Discrete Random Vari-ables

    In this section, X is a discrete random variable with pmf f and cdf F .

    • We defineF (x0−) = lim

    x→x0−F (x)

    (See ”Left hand limits”)

    • We have the following resultsF (x−) = P (X < x)

    f(x) = F (x)− F (x−)

    F (x) =∑xi≤x

    f(xi)

    P (a < X ≤ b) = F (b)− F (a)

    P (a ≤ X ≤ b) = F (b)− F (a−)

    P (a < X < b) = F (b−)− F (a)

    P (a ≤ X < b) = F (b−)− F (a−)

    1.10 Cumulative Distribution Function, Continuous RandomVariable

    In this section, X is a continuous random variable.

    • If X is a continuous random variable then F is a continuous function.

    • F (x−) = F (x) and P (X ≤ x) = P (X < x) (remember, P (X = x) = 0)

    • We have the following results

    P (a < X ≤ b) = P (a ≤ X ≤ b) = P (a < X < b) = P (a ≤ X < b) = F (b)− F (a)

    1.11 Probability Density Function

    In this section, X is a continuous random variable with cdf F .

    • The probability density function (pdf) of X is defined as

    f(x) =dF (x)

    dx

    (assuming that the derivative exists).

    • f(x) is not a probability.

    • If x is not in the range of X then f(x) = 0.

    • If x is in the range of X then f(x) ≥ 0 (with no upper limit).

    • If A is a subset of the range R then the probability that X takes a value in A is given by

    P (X ∈ A) =∫x∈A

    f(x)dx

    • In particular, if A is an interval, A = [a, b], then

    P (a ≤ X ≤ b) =∫ ba

    f(x)dx

    6

  • • The pdf of any continuous random variable must satisfy

    1. Non-negative: f(x) ≥ 0 for all x.2. Integrate to 1: ∫

    x∈Rf(x)dx = 1

    Since f(x) = 0 when x is outside the range of X we can also write this as∫ ∞−∞

    f(x)dx = 1

    • Calculating the cdf from the pdf:

    F (x) =

    ∫ x−∞

    f(t)dt

    1.12 Function of a Random Variable

    • If X is a random variable and g is a real-valued function of one real variable with a domain equal tothe range of X then Y = g(X) is a new random variable. The range of Y is given by the range of g.

    1.13 Expected Value

    • If X is a discrete random variable then we define the expected value of X, denoted by E(X), as

    E(X) =∑x∈R

    xf(x)

    where f(x) is the pmf of X. We often use the symbol µ for E(X) or µX if we need to clarify the nameof the random variable.

    • If X is a continuous random variable then we define the expected value of X, also denoted by E(X),as

    E(X) =

    ∫x∈R

    xf(x)dx

    where f(x) is the pdf of X.

    • If X is a discrete random variable and Y = g(X) then the expected value of Y is given by

    E(Y ) =∑x∈R

    g(x)f(x)

    • If X is a continuous random variable and Y = g(X) then the expected value of Y is given by

    E(Y ) =

    ∫x∈R

    g(x)f(x)dx

    • If X is any random variable and g is a linear function, Y = a + bX then the expected value of Y isgiven by

    E(Y ) = a+ bE(X)

    1.14 Variance

    • If X is an arbitrary random variable with expected value µ then we define the variance of X as

    V ar(X) = E[(X − µ)2] = E(X2)− µ2

    We often write the right-hand side as E(X − µ)2 with the implicit understanding that this is theexpected value of (X − µ)2, not the square of E(X − µ).

    7

  • • We often use the symbol σ2 for V ar(X) or σ2X if we need to clarify the name of the random variable.

    • If X is discrete, it follows thatV ar(X) =

    ∑x∈R

    (x− µ)2f(x)

    • If X is continuous, it follows that

    V ar(X) =

    ∫x∈R

    (x− µ)2f(x)dx

    • If X is a discrete random variable and Y = g(X) with E(Y ) = µY then the variance of Y is given by

    V ar(Y ) =∑x∈R

    (g(x)− µY )2f(x)

    • If X is a continuous random variable and Y = g(X) with E(Y ) = µY then the variance of Y is givenby

    V ar(Y ) =

    ∫x∈R

    (g(x)− µY )2f(x)dx

    • If X is a any random variable and g is a linear function, Y = a+ bX then the variance of Y is givenby

    V ar(Y ) = b2V ar(X)

    • The standard deviation of a random variable X is defined as the square root of the variance,

    σ =√V ar(X)

    1.15 The Constant Random Variable

    • A discrete random variable X whose range is a single number c is called a constant random variableor simply a constant and we write X = c.

    • The pmf of a constant random variable is given by f(c) = 1, f(x) = 0 for all x 6= c.

    • We have E(X) = c and V ar(X) = 0; the expected value of a constant is the constant itself and thevariance of a constant is zero.

    1.16 The Discrete Uniform Distribution

    • A discrete random variable X that takes n different values with equal probability is said to follow adiscrete uniform distribution.

    • Formally, the range of X is R = {x1, x2, · · ·xn} and the pmf is given by f(xi) = 1/n for i = 1, · · · , n.

    • We have

    µ = E(X) =1

    n

    n∑i=1

    xi

    and

    V ar(X) =1

    n

    n∑i=1

    (xi − µ)2 =1

    n

    n∑i=1

    x2i − µ2

    8

  • 1.17 The Bernoulli Distribution

    • A discrete random variable X with range R = {0, 1} is said to follow a Bernoulli distribution.

    • We typically denote f(1) = P (X = 1) by p such that f(0) = P (X = 0) = 1 − p with 0 ≤ p ≤ 1 andwrite

    X ∼ Ber(p).

    • We haveE(X) = p V ar(X) = p(1− p)

    1.18 The Binomial Distribution

    • A discrete random variable X with range R = {0, 1, · · · , n} and pmf

    P (X = k) = f(k) =

    (n

    k

    )pk(1− p)n−k

    where (n

    k

    )=

    n!

    k!(n− k)!is said to follow a binomial distribution and we write

    X ∼ B(p)

    • We haveE(X) = np V ar(X) = np(1− p)

    1.19 The Continuous Uniform Distribution

    • A continuous random variable X with range R = [a, b], b > a and pdf

    f(x) =1

    b− ais said to follow a continuous uniform distribution and we write

    X ∼ U [a, b]

    • We haveE(X) =

    a+ b

    2V ar(X) =

    1

    12(b− a)2

    • If X ∼ U [a, b] and Y = c+ dX is a linear function of X with d 6= 0 then Y will also follow a uniformdistribution.

    1.20 The Exponential Distribution

    • A continuous random variable X with range R = [0,∞) and pdf

    f(x) =1

    λe−x/λ

    is said to follow an exponential distribution with parameter λ and we write

    X ∼ exp(λ)

    where λ > 0.

    • We haveE(X) = λ V ar(X) = λ2

    • The probability density function of an exponential distribution is sometimes written as

    f(x) = λe−λx

    Written like this, E(X) = λ−1 V ar(X) = λ−2

    9

  • 1.21 The Normal Distribution

    • A continuous random variable X with range R = (−∞,∞) and pdf

    f(x) =1√

    2πσ2exp

    (− (x− µ)

    2

    2σ2

    )is said to follow an normal distribution with parameters µ, σ2 and we write

    X ∼ N(µ, σ2)

    • We haveE(X) = µ V ar(X) = σ2

    • If µ = 0 and σ2 = 1 we say that X follows a standard normal distribution. A random variablethat follows a standard normal distribution is typically denoted by Z and Z ∼ N(0, 1) with E(Z) =0, V ar(Z) = 1 and with probability density function

    f(z) =1√2π

    exp

    (−x

    2

    2

    )

    • If X ∼ N(µX , σ2X) and Y = a+ bX is a linear function of X with b 6= 0 then

    Y ∼ N(µY , σ2Y )

    where µY = a+ bµX and σ2Y = b

    2σ2X .

    • If X ∼ N(µ, σ2) thenX − µσ

    ∼ N(0, 1)

    • Excel: If X ∼ N with expected value m and standard deviation s then

    – P (X ≤ x): NORM.DIST(x,m,s,TRUE)– P (X ≥ x): 1− NORM.DIST(x,m,s,TRUE)– the value x such that P (X ≤ x) = p: NORM.INV(p,m,s)– If X ∼ N(0, 1) then you can use NORM.S.DIST(x,TRUE) and NORM.S.INV(p)

    1.22 Two Random Variables

    • Given a sample space S, we can define two functions, X : S → R and Y : S → R. We then have tworandom variables.

    • Given two random variables X,Y and two constants x, y, X = x, Y = y is an event. It is the collectionof outcomes that X will map to x and Y maps to y.

    • The probability of the event X = x, Y = y is denoted by P (X = x, Y = y).

    • Similarly, expressions such as X ≤ x, Y ≤ y are events whose probability is P (X ≤ x, Y ≤ y)

    • X,Y are called discrete random variables if the range of X and the range of Y are both finite orinfinite countable.

    • X,Y are called continuous random variables if the range of X and the range of Y are both uncountableand P (X = x, Y = y) = 0 for all x, y.

    • The range of X,Y is the collection of all ordered pairs (x, y) such that there is an outcome in S thatis mapped to x by X and to y by Y .

    10

  • 1.23 Probability Mass Function, Two Random Variables

    In this section, X,Y are two discrete random variables.

    • For any two real numbers (x, y), we define

    f(x, y) = P (X = x, Y = y)

    as the probability mass function, pmf, of X,Y .

    • If (x, y) is not in the range of X,Y then X = x, Y = y is an empty set and f(x, y) = 0.

    • If (x, y) is in the range of X,Y then X = x, Y = y is a non empty set and 0 ≤ f(x, y) ≤ 1.

    • If A is a subset of the range R then the probability that (X,Y ) takes a value in A is given by

    P ((X,Y ) ∈ A) =∑

    (x,y)∈A

    f(x, y)

    • The pmf must satisfy ∑(x,y)∈R

    f(x, y) = 1

    1.24 Marginal Probability Mass Function

    In this section, X,Y are two discrete random variables with range R and pmf f(x, y).

    • The marginal probability mass function of X is defined as

    fX(x) =∑y

    f(x, y)

    where the sum is over all y such that (x, y) ∈ R.

    • The marginal pmf of Y is defined similarly as

    fY (y) =∑x

    f(x, y)

    • To distinguish f(x, y) from the marginal distributions, we sometimes call it the joint probability massfunction or joint pmf and denote it by fX,Y (x, y).

    1.25 Cumulative Distribution Function, Two Random Variables

    In this section, X,Y are two arbitrary random variables.

    • For two real numbers (x, y), we define

    F (x, y) = P (X ≤ x, Y ≤ y)

    as the cumulative distribution function, cdf of X,Y .

    1.26 Probability Density Function, Two Continuous RandomVariables

    In this section, X,Y are two continuous random variable with cdf F (x, y).

    • The probability density function (pdf) of X,Y is defined as

    f(x, y) =∂2F (x, y)

    ∂x∂y

    (assuming that the derivative exists)

    11

  • • f(x, y) is not a probability.

    • If (x, y) is not in the range of X,Y then f(x, y) = 0.

    • If (x, y) is in the range of X,Y then f(x, y) ≥ 0 (with no upper limit).

    • f(x, y) is sometimes denoted by fX,Y (x, y)

    1.27 Marginal Probability Density Function

    In this section, X,Y are two continuous random variable with pdf f(x, y).

    • The marginal probability density function of X is defined as

    fX(x) =

    ∫y

    f(x, y)dy

    where the integral is over all y such that (x, y) ∈ R.

    • Similarly, the marginal pdf of Y is defined as

    fY (y) =

    ∫x

    f(x, y)dx

    • f(x, y) is sometimes called the joint probability density function or joint pdf and denoted by fX,Y (x, y).

    1.28 Conditional Distributions, Discrete Random Variables

    In this section, X,Y are two discrete random variables with joint pmf fX,Y (x, y) and marginal pmf’sfX(x) and fY (y).

    • X = x and Y = y are two separate events. Therefore, if P (Y = y) 6= 0 then

    P (X = x|Y = y) = P (X = x ∩ Y = y)P (Y = y)

    =P (X = x, Y = y)

    P (Y = y)

    • The probability P (X = x|Y = y) is called the conditional probability of X given Y

    • Similarly if P (X = x) 6= 0 then the conditional probability of Y given X is

    P (Y = y|X = x) = P (X = x, Y = y)P (X = x)

    • We denote the conditional probability P (X = x|Y = y) by f(x|y) or by fX|Y (x|y) if we need to specifythe names of the random variables. P (Y = y|X = x) is denoted by f(y|x) or by fY |X(y|x). f(x|y) iscalled a conditional probability mass function (of x given y) or a conditional pmf

    • Since P (X = x, Y = y) = fX,Y (x, y) we have

    fX|Y (x|y) =fX,Y (x, y)

    fY (y)

    and

    fY |X(y|x) =fX,Y (x, y)

    fX(x)

    whenever the denominators are not zero.

    12

  • 1.29 Conditional Distributions, Continuous Random Variables

    In this section, X,Y are two continuous random variables with joint pdf fX,Y (x, y) and marginal pdf’sfX(x) and fY (y).

    • We definef(x|y) = fX,Y (x, y)

    fY (y)

    as a conditional probability density function (of x given y) whenever fY (y) 6= 0.

    • Similarly, we define

    f(y|x) = fX,Y (x, y)fX(x)

    whenever fX(x) 6= 0.

    1.30 Independent Random Variables

    • Two random variables X,Y are said to be independent if and only if

    f(x, y) = fX(x)fY (y)

    for all (x, y) in the range of X,Y .

    • X,Y are independent if and only if f(x|y) = fX(x) for all x, y. Also X,Y are independent if and onlyif f(y|x) = fY (y) for all x, y.

    1.31 Conditional Expectation

    • If X,Y are two discrete random variables then we define the conditional expectation of X given Y = yas the number

    E(X|Y = y) =∑x

    xf(x|y)

    E(Y |X = x) is defined similarly.

    • If X,Y are two continuous random variables then we define the conditional expectation of X givenY = y as the number

    E(X|Y = y) =∫x

    xf(x|y)dx

    E(Y |X = x) is defined similarly.

    • If X,Y are independent random variables then E(X|Y = y) = E(X) and E(Y |X = x) = E(Y ).

    • Without specifying a specific value for Y , E(X|Y ) is a random variable.

    • Law of iterated expectation: E(E(X|Y )) = E(Y )

    1.32 Function of Two Random Variables

    • If X,Y are two random variables and g is a real-valued function of two variables whose domain isequal to the range of X,Y then Z = g(X,Y ) is a new random variable.

    • If g is a linear function, g(x, y) = a+ bx+ cy then we say that Z = a+ bX + cY is a linear functionof the random variables X and Y .

    13

  • 1.33 Expected Value and Variance of a Linear Function of TwoRandom Variables

    • If X,Y are two random variables and Z = a+ bX + cY is a linear function of X and Y then

    E(Z) = E(a+ bX + cY ) = a+ bE(X) + cE(Y )

    • If X,Y are two independent random variables and Z = a+ bX + cY is a linear function of X and Ythen

    V ar(Z) = V ar(a+ bX + cY ) = b2V ar(X) + c2V ar(Y )

    1.34 Expected Value of a Product of Two Random Variables

    • If X,Y are two independent random variables and Z = XY is a product of X and Y then

    E(Z) = E(XY ) = E(X)E(Y )

    • If, in addition, f and g are two real valued functions of a real variable and Z = f(X)f(Y ) is a productof f(X) and f(Y ) then

    E(Z) = E(f(X)f(Y )) = E(f(X))E(f(Y ))

    1.35 Covariance of Two Random Variables

    • If X,Y are two random variables with expected values µX and µY respectively then we define thecovariance of X and Y , denoted by Cov(X,Y ) as the number

    Cov(X,Y ) = E[(X − µX)(Y − µY )] = E(XY )− µXµY

    • Cov(X,Y ) is sometimes denoted by σX,Y or σXY

    1.36 Covariance of Two Random Variables, Results

    1. If X,Y are two independent random variables then Cov(X,Y ) = 0. The opposite is not necessarilytrue.

    2. Cov(X,X) = V ar(X)

    3. If c is a constant then Cov(X, c) = 0

    4. Covariance of linear functions of random variables (a, b, c, d are constants):

    Cov(a+ bX, c+ dY ) = bdCov(X,Y )

    5. Covariance of linear functions of two random variables (a, b, c, d are constants):

    Cov(aX1 + bX2, cY1 + dY2) = acCov(X1, Y1) + adCov(X1, Y2) + bcCov(X2, Y1) + bdCov(X2, Y2)

    6. If X,Y are two arbitrary random variables then

    V ar(a+ bX + cY ) = b2V ar(X) + c2V ar(Y ) + 2bcCov(X,Y )

    14

  • 1.37 Correlation

    • If X,Y are two random variables then we define the correlation between X and Y , denoted byCorr(X,Y ) as the number

    Corr(X,Y ) =Cov(X,Y )√V ar(X)V ar(Y )

    • Corr(X,Y ) is sometimes denoted by ρX,Y or ρXY

    • For any two random variables X,Y ,

    −1 ≤ Corr(X,Y ) ≤ 1

    • We have

    – Corr(X,Y ) = 0 if and only if Cov(X,Y ) = 0. We then say that X and Y are uncorrelated.

    – Corr(X,Y ) > 0 if and only if Cov(X,Y ) > 0. We then say that X and Y are positively correlated.

    – Corr(X,Y ) < 0 if and only if Cov(X,Y ) < 0. We then say that X and Y are negatively correlated.

    1.38 Several Random Variables

    • Given a sample space S, we can define n functions, Xi : S → R for i = 1, · · · , n. We then have n randomvariables X1, · · · , Xn. Together, they are called a random vector denoted by X = (X1, · · · , Xn).

    • Given n random variables X1, · · · , Xn and n constants x1, · · · , xn, X1 = x1, · · · , Xn = xn is an event.It is the collection of outcomes that X1 will map to x1 and X2 maps to x2 and so on.

    • The probability of the event X1 = x1, · · · , Xn = xn is denoted by P (X1 = x1, · · · , Xn = xn).

    • Similarly, expressions such as X1 ≤ x1, · · · , Xn ≤ xn are events whose probability is P (X1 ≤x1, · · · , Xn ≤ xn)

    • X1, · · · , Xn are called discrete random variables if the range of all of the Xi’s are finite or infinitecountable.

    • X1, · · · , Xn are called continuous random variables if the range of all of the Xi’s are uncountable andP (X1 = x1, · · · , Xn = xn) = 0 for all x1, · · · , xn.

    • The range of X1, · · · , Xn is the collection of all ordered n-tuples (x1, · · · , xn) such that there is anoutcome in S that is mapped to x1 by X1 and so on.

    • If (x1, · · · , xn) are n real numbers then

    – If X1, · · · , Xn are n arbitrary random variables then we define the cumulative distribution function,cdf of X1, · · · , Xn as

    F (x1, · · · , xn) = P (X1 ≤ x1, · · · , Xn ≤ xn)

    – If X1, · · · , Xn are n discrete random variables then we define the probability mass function, pmf,of X1, · · · , Xn as

    f(x1, · · · , xn) = P (X1 = x1, · · · , Xn = xn)

    – If X1, · · · , Xn are n continuous random variables then we define the probability density function,pdf, of X1, · · · , Xn as

    f(x1, · · · , xn) =∂nF (x1, · · · , xn)∂x1 · · · ∂xn

    (assuming that the derivative exists).

    • If X1, · · · , Xn are n discrete random variables then we define the marginal probability mass functionof X1 as

    fX1(x1) =∑

    x2,··· ,xn

    f(x1, · · · , xn)

    where the sum is over all x2, · · · , xn such that (x1, · · · , xn) is in the range of X1, · · · , Xn. The marginalpmf of X2, · · · , Xn are defined similarly.

    15

  • • If X1, · · · , Xn are n continuous random variables then we define the marginal probability densityfunction of Xi similarly by integrating away the remaining random variables.

    • n random variables X1, · · · , Xn are said to be independent if and only if

    f(x1, · · · , xn) = fX1(x1) · · · fXn(xn)

    for all ((x1, · · · , xn) in the range of X1, · · · , Xn.

    • If X1, · · · , Xn are n random variables and g is a real-valued function of n variables whose domain isequal to the range of X1, · · · , Xn then Z = g(X1, · · · , Xn) is a new random variable.

    • If g is a linear function, g(x1, · · · , xn) = c0+c1x1+· · ·+cnxn then we say that Z = c0+c1X1+· · ·+cnXnis a linear function of the random variables X1, · · · , Xn (c0, c1, · · · cn are constants).

    • If X1, · · · , Xn are n random variables and Z = c0 +c1X1 +· · ·+cnXn is a linear function of X1, · · · , Xnthen

    E(Z) = E(c0 + c1X1 + · · ·+ cnXn) = c0 + c1E(X1) + · · ·+ cnE(Xn)

    • If X1, · · · , Xn are n independent random variables and Z = c0 + c1X1 + · · ·+ cnXn is a linear functionof X1, · · · , Xn then

    V ar(Z) = V ar(c0 + c1X1 + · · ·+ cnXn) = c21V ar(X1) + · · ·+ c2nV ar(Xn)

    • If X1, · · · , Xn are n arbitrary random variables then

    V ar(c0 + c1X1 + · · ·+ cnXn) =n∑i=1

    c2iV ar(Xi) + 2

    n∑i=1

    ∑j

  • 1.41 Statistics

    • If X1, · · · , Xn is a random sample and g is a real-values function of n variables with a domain thatcontains the range of X1, · · · , Xn then Θ = g(X1, · · · , Xn) is called a statistic. As a function of randomvariables, a statistic is itself a random variable.

    • Once the experiment has been performed, the outcome of the random variables X1, · · · , Xn is observedand denoted by the numbers x1, · · · , xn. The outcome of the statistic Θ, the observed value of thestatistic, can then be calculated as g(x1, · · · , xn) which is now a number that we view as a drawingfrom the random variable Θ.

    • The definition may be extended in a natural way to a random sample of vectors.

    1.42 Sample Mean

    • If X1, · · · , Xn is a random sample then we define a statistic called the sample mean, denoted as X̄, by

    X̄ =1

    n

    n∑i=1

    Xi

    • The observed value of the sample mean is called the observed sample mean and it is denoted by x̄.

    1.43 Sample Variance

    • If X1, · · · , Xn is a random sample then we defines the sample variance, denoted as S2, by

    S2 =1

    n− 1

    n∑i=1

    (Xi − X̄)2

    • The observed value of the sample variance is called the observed sample variance and it is denoted bys2.

    • S =√S2 is called the sample standard deviation (s is the observed sample standard deviation).

    • Never use the term ”variance” for S2 or s2. Variance is a property of a random variable while thesample variance S2 is a function of the random variables constituting a random sample. Samplevariance is to sample mean as variance is to expected value.

    1.44 Sample Covariance

    • Consider an IID random sample of size n of random vectors of size 2, X1, · · · ,Xn with Xi =(Xi,1, Xi,2).

    • We defines the sample covariance, denoted as S21,2, by

    S21,2 =1

    n− 1

    n∑i=1

    (Xi,1 − X̄1)(Xi,2 − X̄2)

    where X̄1 is the sample average of X1,1, · · · , Xn,1 and X̄2 is the sample average of X1,2, · · · , Xn,2

    • Once the experiment has been performed, the outcome of S21,2, sometimes called the observed samplecovariance, is denoted by the number

    s21,2 =1

    n− 1

    n∑i=1

    (xi,1 − x̄1)(xi,2 − x̄2)

    where x̄1 and x̄2 are observed sample averages of the corresponding sub-samples.

    17

  • • The matrixS =

    (S21 S

    21,2

    S22,1 S22

    )(1.1)

    is called the sample covariance matrix. Here, S21 and S22 are the sample variances of sub-samples 1

    and 2. Note that S is symmetric as S22,1 = S21,2.

    • These definitions may be extended to the general case where the random vectors are of size m. Thesample covariance matrix will then be an m×m matrix.

    1.45 Properties of the Sample Mean

    • If X1, · · · , Xn is a random sample and E(Xi) = µ for i = 1, · · · , n then

    E(X̄) = µ

    • If X1, · · · , Xn is an IID random sample with E(Xi) = µ and V ar(Xi) = σ2 for i = 1, · · · , n then

    E(X̄) = µ V ar(X̄) =σ2

    n

    We denote the standard deviation of X̄ by SD(X̄) and we have

    SD(X̄) =√V ar(X̄) =

    σ√n

    • If X1, · · · , Xn is an IID random sample with Xi ∼ N(µ, σ2) for i = 1, · · · , n then

    X̄ ∼ N(µ, σ2/n)

    1.46 Properties of the Sample Covariance

    • If X1, · · · ,Xn is an IID random sample of size 2 with E(Xi,1) = µ1, E(Xi,2) = µ2, V ar(Xi,1) = σ21 ,V ar(Xi,2) = σ

    22 and Cov(Xi,1, Xi,2) = σ

    21,2 for i = 1, · · · , n then

    E(S21,2) = σ21,2

    1.47 The Chi-square Distribution

    • If Z follows a standard normal distribution then Y = Z2 is said to follow a chi-square distribution withone degree of freedom and we write

    Y ∼ χ21If Z1, · · ·Zk are k independent standard normal random variables, then

    Y =

    k∑i=1

    Z2i

    is said to follow a chi-square distribution with k degree of freedom and we write

    Y ∼ χ2k

    • If Y ∼ χ2k then the range of Y is [0,∞)

    • If Y ∼ χ2k then E(Y ) = k and V ar(Y ) = 2k

    • If Y1 ∼ χ2k1 and Y2 ∼ χ2k2

    and Y1 and Y2 are independent then Y1 + Y2 ∼ χ2k1+k2• Excel: If Y ∼ χ2k then

    – P (Y ≤ y): CHISQ.DIST(y,k,TRUE)– P (Y ≥ y): CHISQ.DIST.RT(y,k)– the value y such that P (Y ≤ y) = p: CHISQ.INV(p,k)– the value y such that P (Y ≥ y) = p: CHISQ.INV.RT(p,k)

    18

  • 1.48 Properties of the Sample Variance

    • If X1, · · · , Xn is an IID random sample with E(Xi) = µ and V ar(Xi) = σ2 for i = 1, · · · , n then

    E(S2) = σ2

    • If X1, · · · , Xn is an IID random sample with Xi ∼ N(µ, σ2) for i = 1, · · · , n then X̄ and S2 areindependent random variables.

    • If X1, · · · , Xn is an IID random sample with Xi ∼ N(µ, σ2) for i = 1, · · · , n then

    (n− 1)S2

    σ2∼ χ2n−1

    1.49 The Standard Error of the Sample Mean

    • If X1, · · · , Xn is an IID random sample with E(Xi) = µ and V ar(Xi) = σ2 for i = 1, · · · , n then wedefine the standard error of the sample mean, denoted by SE(X̄), as

    SE(X̄) =

    √S2

    n=

    S√n

    (Remember, SD(X̄) = σ/√n).

    1.50 The t-distribution

    • If Z follows a standard normal distribution and Y follows a chi-squared distribution with k degrees offreedom and Z and Y are independent, then t = Z√

    Y/kis said to follow a t-distribution with k degrees

    of freedom and we writeT ∼ tk

    • If T ∼ tk then the range of T is (−∞,∞)

    • If T ∼ tk then E(T ) = 0

    • The probability density function of the t-distribution is symmetric around 0, P (T ≥ c) = P (T ≤ −c)for all constants c.

    • It has ”fatter tails” compared to Z, P (T > c) > P (Z > c) for any constant c > 0.

    • Excel: If T ∼ tk then

    – P (T ≤ t): T.DIST(t,k,TRUE)– P (T ≥ t): T.DIST.RT(t,k)– P (|T | ≥ t) = P (T ≥ t) + P (T ≤ −t): T.DIST.2T(t,k)– the value t such that P (T ≤ t) = p: T.INV(p,k)– the value t such that P (|T | ≥ t) = p: T.INV.2T(p,k)

    1.51 The F-distribution

    • If X follows a chi-square distribution with k1 degrees of freedom and Y follows a chi-squared distri-bution with k2 degrees of freedom and X and Y are independent random variables, then

    F =X/k1Y/k2

    is said to follow an F-distribution with k1 and k2 degrees of freedom and we write

    F ∼ F (k1, k2)

    19

  • • If F ∼ F (k1, k2) then the range of F is [0,∞)

    • Excel: If F ∼ F (k1, k2) then

    – P (F ≤ f): F.DIST(f,k1,k2,TRUE)– P (F ≥ f): F.DIST.RT(f,k1,k2)– the value f such that P (F ≤ f) = p: F.INV(p,k1,k2)– the value f such that P (F ≥ f) = p: F.INV.RT(p,k1,k2)

    1.52 Statistical Model

    • If X1, · · · , Xn is a random sample with a given probability density/mass function f(x1, · · · , xn) whichdepends on k unknown parameters θ = (θ1, · · · , θk) then taken together we have a (parametric)statistical model or a data generating process.

    • With a statistical model and given values for θ1, · · · θk we can simulate an outcome x1, · · · , xn for arandom sample using a computer.

    • Four common statistical models:

    1. X1, · · · , Xn is an independent random sample with E(Xi) = µ and V ar(Xi) = σ2 for i = 1, · · · , n.This model has two unknown parameters, µ, σ2.

    2. X1, · · · , Xn is an IID random sample with Xi ∼ N(µ, σ2) for i = 1, · · · , n. This is a special case of1. We write

    Xi ∼ IIDN(µ, σ2) i = 1, · · · , n

    3. X1, · · · , Xn is an independent random sample with E(Xi) = µ and V ar(Xi) = σ20 for i = 1, · · · , nwhere σ20 is a known constant. This model has one unknown parameter µ and is a special case of 1.

    4. X1, · · · , Xn is an IID random sample with Xi ∼ N(µ, σ20) for i = 1, · · · , n where σ20 is a knownconstant. This is a special case of 2 and of 3. We write

    Xi ∼ IIDN(µ, σ20) i = 1, · · · , n

    1.53 Estimator

    • If Θ = g(X1, · · · , Xn) is a statistic in a statistical model and we use the outcome of Θ to estimate θ,one of the unknown parameters of the statistical model, then the statistic Θ is called an estimator forθ denoted by θ̂.

    • The sample mean X̄ is the common estimator of the mean µ in the four common statistical models(see section 1.52). Therefore, X̄ can also be denoted by µ̂.

    • The sample variance S2 is the common estimator of the variance σ2 in the common statistical modelswhen σ2 is unknown (models 1 and 2). Therefore, S2 can also be denoted by σ̂2.

    • If g is a linear function then Θ = g(X1, · · · , Xn) is called a linear statistic or a linear estimator if itestimates an unknown parameter. The sample mean is a linear estimator while the sample variance isnot.

    1.54 Properties of Estimators

    • If θ is an unknown parameter in a statistical model and θ̂ is an estimator of θ then we say that theestimator is unbiased if

    E(θ̂) = θ

    • The sample mean X̄ is an unbiased estimator of the mean µ in the four common statistical models.

    • The sample variance S2 is an unbiased estimator of the variance σ2 in the common statistical modelswhen σ2 is unknown (models 1 and 2).

    20

  • • If θ is an unknown parameter in a statistical model and θ̂1 and θ̂2 are two unbiased estimators of thesame parameter θ then we say that θ̂1 is more efficient than θ̂2 if

    V ar(θ̂1) ≤ V ar(θ̂2)

    • If θ is an unknown parameter in a statistical model and θ̂ is an unbiased estimator which is moreefficient than any other unbiased estimator then θ̂ is called a minimum variance unbiased estimator,MVUE or the best unbiased estimator.

    • If Xi ∼ IIDN(µ, σ2), i = 1, · · · , n (common statistical model number 2) then X̄ is MVUE of µ andS2 is MVUE of σ2.

    • If θ is an unknown parameter in a statistical model and θ̂ is a linear unbiased estimator which is moreefficient than any other linear unbiased estimator then θ̂ is called a best linear unbiased estimator,BLUE.

    • If X1, · · · , Xn is an independent random sample with E(Xi) = µ and V ar(Xi) = σ2 for i = 1, · · · , n(common statistical model number 1) then X̄ is BLUE of µ.

    1.55 Standard Error

    • Given: θ̂1, · · · , θ̂k which are estimators of the unknown parameters θ1, · · · , θk in a statistical model.The variance of each estimator, V ar(θ̂i), typically depends on some or all of the unknown parametersand is unknown. If each unknown parameter is replaced by its corresponding estimator we get theestimated variance of θ̂i denoted by ˆV ar(θ̂i).

    • The square root of the estimated variance of an estimator θ̂ is called the standard error of θ̂, denotedby SE(θ̂)

    • In the fundamental statistical model, V ar(X̄) = σ2/n and ˆV ar(X̄) = S2/n and SE(X̄) = S/√n as

    defined before.

    1.56 The Z-statistics

    • If X1, · · · , Xn is an IID random sample with E(Xi) = µ and V ar(Xi) = σ20 for i = 1, · · · , n (commonstatistical model number 3) and µ0 is an arbitrary constant then we define a statistic called theZ-statistic by

    Z =X̄ − µ0SD(X̄)

    =X̄ − µ0σ0/√n

    • If µ0 = µ thenE(Z) = 0 V ar(Z) = 1

    • If Xi ∼ N(µ, σ20) for i = 1, · · · , n (common statistical model number 4) and µ0 = µ then

    Z ∼ N(0, 1)

    • Note that if µ0 6= µ then E(Z) 6= 0 and Z cannot follow a standard normal.

    1.57 Generalizing the Z-statistics

    • If X1, · · · , Xn is a random sample and Θ is a statistic such that E(Θ) = µΘ and V ar(Θ) = σ2Θ thenfor a given constant µ0 we define the Z-statistic by

    Z =Θ− µ0σΘ

    • If Θ ∼ N(µΘ, σ2Θ) and µ0 = µΘ then Z ∼ N(0, 1),

    Z =Θ− µΘσΘ

    ∼ N(0, 1)

    • The Z-statistic defined in section 1.56 when X1, · · · , Xn is an IID random sample with E(Xi) = µand V ar(Xi) = σ

    2 is a special case where Θ = X̄, µΘ = µ, σ2Θ = σ

    2/n.

    21

  • 1.58 The t-statistics

    • If X1, · · · , Xn is an IID random sample with E(Xi) = µ and V ar(Xi) = σ2 for i = 1, · · · , n (commonstatistical model number 1) and µ0 is an arbitrary constant then we define a statistic called thet-statistic by

    t =X̄ − µ0SE(X̄)

    =X̄ − µ0S/√n

    where S is the square root of the sample variance, see section 1.43.

    • If Xi ∼ N(µ, σ2) for i = 1, · · · , n (common statistical model number 2) and µ = µ0 then the t-statisticsfollows a t-distribution with n− 1 degrees of freedom,

    t =X̄ − µS/√n∼ tn−1

    • Note that if µ0 6= µ then the t-statistic will not follow a t-distribution.

    1.59 Generalizing the t-statistics

    • If X1, · · · , Xn is a random sample and Θ is a statistic such that E(Θ) = µΘ and V ar(Θ) = σ2Θ and ifσ̂2Θ is a non-negative estimator of σ

    2Θ then for a given constant µ0 we define the t-statistic by

    t =Θ− µ0σ̂Θ

    where σ̂Θ =√σ̂2Θ is the standard error of Θ.

    • If

    – µ0 = µΘ,

    – Θ ∼ N(µΘ, σ2Θ)– pσ̂2Θ/σ

    2Θ ∼ χ2p

    – Θ and σ̂2Θ are independent random variables

    then the t-statistics follows a t-distribution with p degrees of freedom,

    t ∼ tp

    • The t-statistic for common statistical model number 2 (see section 1.58) is a special case where Θ =X̄, µΘ = µ, σ

    2Θ = σ

    2/n, σ̂2Θ = S2/n and p = n− 1.

    1.60 Critical Values

    • If Z ∼ N(0, 1) then we define the critical value zα by

    P (Z > zα) = α

    Because of symmetry, P (Z < −zα) = α as well. In Excel, you can calculate zα using

    NORM.S.INV(1− α)

    see section 1.21.

    • If T ∼ tn then we define the critical value tα,n by

    P (T > tα,n) = α

    Because of symmetry, P (T < −tα,n) = α as well. In Excel, you can calculate tα,n using

    T.INV.RT(α, n)

    see section 1.50. You can calculate the two-tailed critical value tα/2,n using

    T.INV.2T(α, n)

    22

  • • If X ∼ χ2n then we define the critical value χ2α,n by

    P (X > χ2α,n) = α

    In Excel, you can calculate χ2α,n using

    CHISQ.INV.RT(α, n)

    see section 1.47.

    • If F ∼ F (n1, n2) then we define the critical value fα,n1,n2 by

    P (F > fα,n1,n2) = α

    In Excel, you can calculate fα,n1,n2 using

    F.INV(α, n1, n2)

    see section 1.51.

    1.61 Hypothesis Testing, Theory

    • Given a statistical model X1, · · · , Xn with unknown parameters θ = (θ1, · · · , θk), a null hypothesis isa subset D0 of D, where D is the set of all possible values for the unknown parameters. The remainingvalues, D \D0, is called the alternative hypothesis.

    • We say that a null hypothesis is true if the true value of θ is inside D0. Otherwise we say that it isfalse.

    • If D0 contains only a single point, then the null hypothesis is called simple.

    • If at least one of the unknown parameters is restricted to a given value in D0, then the null hypothesisis called sharp.

    • You test a given hypothesis by

    – defining a test statistic Θ = g(X1, · · · , Xn)– defining a rejection region (or a critical region) which is a subset of the real numbers

    – rejecting the null hypothesis if the outcome of the test statistic falls into the rejection region. If itdoes not we say that we fail to reject or do not reject the null hypothesis. We never ”accept” a nullhypothesis.

    • Rejection of the null hypothesis when it is true is called a Type I error. The probability of committinga type I error is denoted by α. This probability is also called the size of the critical region and thelevel of significance of the test.

    • Non rejection of the null hypothesis when it is false is called a Type II error. The probability ofcommitting a type II error is denoted by β. 1− β is called the power of the test.

    • It is common to first select the level of significance α and then to choose the rejection region in sucha way that the probability of committing a type I error is precisely α.

    • Once the outcome of the test statistic has been calculated, the p-value of the test is the level ofsignificance such that the value of the test statistic is equal to one of the end-points of the criticalregion such that we are indifferent between rejecting the null hypothesis and not rejecting the nullhypothesis.

    23

  • 1.62 Hypothesis Tests Involving the Mean

    • Throughout this section, X1, · · · , Xn is an IID random sample with Xi ∼ N(µ, σ2) for i = 1, · · · , nwhere µ is unknown while σ2 may be known or not known.

    • The null hypothesis is always the subset restricted by µ = µ0 where µ0 is a given number. The nullhypothesis is written as

    H0 : µ = µ0

    • If all values for µ are possible then the alternative hypothesis is the subset restricted by µ 6= µ0 denotedby

    H1 : µ 6= µ0This is called a two-sided alternative.

    • If restrict µ to µ ≤ µ0 then the alternative hypothesis is the subset restricted by µ < µ0 denoted by

    H1 : µ < µ0

    This is called a one-sided alternative.

    • Similarly, if we restrict µ to µ ≥ µ0 then the one-sided alternative is

    H1 : µ > µ0

    • All tests have a level of significance equal to α.

    1. σ2 is known, two sided alternative H1 : µ 6= µ0

    • Test statistic: The Z−statistic• Under the null: Z ∼ N(0, 1)• Critical region: (−∞,−zα/2) and (zα/2,∞) (see section 1.60)

    2. σ2 is known, one sided alternative H1 : µ < µ0

    • Test statistic: The Z−statistic• Under the null: Z ∼ N(0, 1)• Critical region: (−∞,−zα)

    3. σ2 is known, one sided alternative H1 : µ > µ0

    • Test statistic: The Z−statistic• Under the null: Z ∼ N(0, 1)• Critical region: (zα,∞)

    4. σ2 is unknown, two sided alternative H1 : µ 6= µ0

    • Test statistic: The t−statistic• Under the null: t ∼ tn−1• Critical region: (−∞,−tα/2,n−1) and (tα/2,n−1,∞) (see section 1.60)

    5. σ2 is unknown, one sided alternative H1 : µ < µ0

    • Test statistic: The t−statistic• Under the null: t ∼ tn−1• Critical region: (−∞,−tα,n−1)

    6. σ2 is unknown, one sided alternative H1 : µ > µ0

    • Test statistic: The t−statistic• Under the null: t ∼ tn−1• Critical region: (tα,n−1,∞)

    24

  • 1.63 Confidence Intervals

    • Given a statistical model X1, · · · , Xn where θ is one of the unknown parameters, an interval estimateof θ is an interval of the form Θ1 < θ < Θ2 where Θ1 and Θ2 are two statistics such that Θ1 < Θ2always holds.

    • If Θ1 < θ < Θ2 is an interval estimate of θ such that

    Pr(Θ1 < θ < Θ2) = 1− α

    then Θ1 < θ < Θ2 is a (1− α)100% confidence interval for θ.

    • In general, the smaller the α, the bigger the confidence interval.

    • If X1, · · · , Xn is an IID random sample with Xi ∼ N(µ, σ2) for i = 1, · · · , n where σ2 is known then

    X̄ − zα/2σ√n< µ < X̄ + zα/2

    σ√n

    is a (1− α)100% confidence interval for µ.

    • If X1, · · · , Xn is an IID random sample with Xi ∼ N(µ, σ2) for i = 1, · · · , n where σ2 is unknown then

    X̄ − tα/2,n−1S√n< µ < X̄ + tα/2,n−1

    S√n

    is a (1− α)100% confidence interval for µ. S2 is the sample variance, see section 1.43.

    • If X1, · · · , Xn is an IID random sample with Xi ∼ N(µ, σ2) for i = 1, · · · , n where σ2 is known orunknown then the null hypothesis H0 : µ = µ0 against the two-sided alternative H1 : µ 6= µ0 will berejected at the level of significance α if and only if µ0 is outside the (1 − α)100% confidence intervalfor µ.

    1.64 Asymptotically Unbiased Estimator

    • X1, · · · , Xn is a random sample of a statistical model with unknown parameters θ = (θ1, · · · , θk). Toinvestigate how a given statistic Θ or a given estimator θ̂ depends on the sample size, we will add asubscript n and write Θn or θ̂n

    • If θ is an unknown parameter in a statistical model of sample size n and θ̂n is an estimator of θ thenwe say that the estimator is asymptotically unbiased if

    E(θ̂n)→ θ as n→∞

    • An unbiased estimator is always asymptotically unbiased but the opposite is not necessarily true.

    1.65 Consistency

    • If X1, · · · , Xn is a random sample and Θn is a given statistic then we say that Θn converges inprobability to a constant c if for each ε > 0

    P (|Θn − c| < ε)→ 1 as n→∞

    We then writeplim Θn = c

    • If θ is an unknown parameter in a statistical model of sample size n and θ̂n is an estimator of θ thenwe say that the estimator is a consistent estimator of θ if

    plim θn = θ

    (The opposite is not true in general).

    25

  • • If E(Θn)→ c and V ar(Θn)→ 0 as n→∞ then Θn converges in probability to c.

    • If θ̂n is an asymptotically unbiased estimator of θ and V ar(θ̂n)→ 0 as n→∞ then θ̂n is a consistentestimator of θ.

    • If X1, · · · , Xn is an IID random sample with E(Xi) = µ and V ar(Xi) = σ2 for i = 1, · · · , n then X̄ isa consistent estimator of µ and S2 is a consistent estimator of σ2.

    1.66 plim Rules

    • If X1, · · · , Xn is a random sample and Θn is a given statistics such that

    plim Θn = c

    thenplim f(Θn) = f(c)

    for any arbitrary function f : R→ R.

    • If X1, · · · , Xn is a random sample and Θ1, · · · ,Θk are k statistics (subscript n removed for clarity)such that

    plim Θj = cj for j = 1, · · · , k

    thenplim g(Θ1, · · · ,Θk) = g(c1, · · · , ck)

    for any arbitrary function g : Rk → R.

    • In particular,plim Θ1Θ2 = c1c2 plim

    Θ1Θ2

    =c1c2

    • Keep in mind that the plim-rules do not apply unless the statistics converge to a constant.

    1.67 Convergence in Distribution

    • If X1, X2 · · · is an infinite sequence of random variables then we say that they converge in distributionto a random variable X if

    limn→∞

    Fn(x) = F (x)

    for every x at which F is continuous. Here, Fn is the cdf of Xn and F is the cdf of X.

    1.68 Central Limit Theorem

    • If X1, · · · , Xn is an IID random sample with E(Xi) = µ and V ar(Xi) = σ2 for i = 1, · · · , n then theZ-statistic converges in distribution to the standard normal.

    • Formally, if

    Zn =X̄n − µσ/√n

    then Zn → N(0, 1) where convergence is in distribution.

    • With the assumptions of this section and with n large enough, X̄ is approximately normally distributedwith mean µ and variance σ2/n.

    • With the assumptions of this section and with n large enough, the t-statistic

    t =X̄n − µS/√n

    is approximately t-distributed with n− 1 degrees of freedom.

    26