32
Chapter 2 Copyright © 2011 Pearson Addison-Wesley. All rights reserved. 1-1

Chapter 2 - Sergio Turner · ... Single: Expected value, mean, variance, standard deviation ... covariance, sums of rvs d) Key ... each weighted by its probability

  • Upload
    phungtu

  • View
    233

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Chapter 2 - Sergio Turner · ... Single: Expected value, mean, variance, standard deviation ... covariance, sums of rvs d) Key ... each weighted by its probability

Chapter 2

Copyright © 2011 Pearson Addison-Wesley. All rights reserved. 1-1

Page 2: Chapter 2 - Sergio Turner · ... Single: Expected value, mean, variance, standard deviation ... covariance, sums of rvs d) Key ... each weighted by its probability

Review of Probability

The probability framework for statistical inference

a) Random variables and probability distributions

b) Single: Expected value, mean, variance, standard deviation

Copyright © 2011 Pearson Addison-Wesley. All rights reserved.

c) Two: joint VS marginal VS conditional distributions; independence, covariance, sums of rvs

d) Key distributions: Normal, Chi-squared, Student t, F

e) Random sampling & sample average’s distribution

f) Large sample approximations

1-2

Page 3: Chapter 2 - Sergio Turner · ... Single: Expected value, mean, variance, standard deviation ... covariance, sums of rvs d) Key ... each weighted by its probability

Random variables & probability distrib’ns

• Random variables (rvs): commute time, #computer crashes

• Rvs can be continuous (time) or discrete (#crashes)

• Outcomes: Mutually exclusive values that a rv can take

• Eg: no crash, crash once, crash twice, …; numerically: 0,1,2,…

• Sample space: set of all outcomes, e.g. {0,1,2,…}

Copyright © 2011 Pearson Addison-Wesley. All rights reserved.

• Sample space: set of all outcomes, e.g. {0,1,2,…}

• Event: By definition a collection of outcomes.

• E.g. “crash no more than once” = {0,1} = {no crash, crash once}

• Probability of an outcome/event: Proportion of time it occurs in the long run (after many independent, identical experiments)

1-3

Page 4: Chapter 2 - Sergio Turner · ... Single: Expected value, mean, variance, standard deviation ... covariance, sums of rvs d) Key ... each weighted by its probability

Probability distrib’n – discrete rv

• Probability distribution of a rv: The list, across outcomes, of the probability of the outcome

• The probabilities in the list add up to 1

• Example: M = #computer crashes while you write paper

• Assume: If four crashes occur, write paper by hand (M<5)

Copyright © 2011 Pearson Addison-Wesley. All rights reserved.

• Assume: If four crashes occur, write paper by hand (M<5)

• Event = {0,1} has probability Pr(M=0) + Pr(M=1) = .9

• Cumulative distribution function cdf: Prob. rv is at most given value, e.g. cdf(1) = .9

1-4

outcome 0 1 2 3 4

Pr dist .8 .1 .06 .03 .01

Cum dist .8 .9 .96 .99 1

Page 5: Chapter 2 - Sergio Turner · ... Single: Expected value, mean, variance, standard deviation ... covariance, sums of rvs d) Key ... each weighted by its probability

Bernoulli distribution – discrete rv

• If there are only TWO outcomes, rv called Bernoulli

• E.g.: Let G be gender of next person you meet

• Outcomes are “male”, “female”

• If probability of one outcome is p, the other’s must be 1 – p (for prob’s add up to 1)

Copyright © 2011 Pearson Addison-Wesley. All rights reserved.

(for prob’s add up to 1)

1-5

Page 6: Chapter 2 - Sergio Turner · ... Single: Expected value, mean, variance, standard deviation ... covariance, sums of rvs d) Key ... each weighted by its probability

Probability distrib’n – continuous rv

• Cumulative probability distribution cdf(x): Probability rv is at most a given value, x

• p. 19, figure 2.2

• Probability density function pdf(x): Intuitively, it is the probability of outcome x … except with a continuous rv, usually this is zero for every x.

Copyright © 2011 Pearson Addison-Wesley. All rights reserved.

usually this is zero for every x.

• More accurate: pdf is the function with the property that, for x<y, cdf(y) – cdf(x) = area under pdf between pdf(y) and pdf(x)

• E.g. Probability(commute 15’ - 20’ long) = .78 - .20 = .58

1-6

Page 7: Chapter 2 - Sergio Turner · ... Single: Expected value, mean, variance, standard deviation ... covariance, sums of rvs d) Key ... each weighted by its probability

Expected values, Mean, Variance

• Expected value E(Y) of a random variable Y: the long run average value of the rv (after many independent, repeated ocurrences)

• Its value is denoted µY … Also called expectation of Y or mean of Y

Copyright © 2011 Pearson Addison-Wesley. All rights reserved.

Y

mean of Y

• Computed as average of outcomes, each weighted by its probability

• E.g.: E(M) = 0x.8 + 1x.1 + 2x.06 + 3x.03+4x.01 = .35 … the mean number of crashes is .35

1-7

Page 8: Chapter 2 - Sergio Turner · ... Single: Expected value, mean, variance, standard deviation ... covariance, sums of rvs d) Key ... each weighted by its probability

Expected values of Bernoulli rv

• Say Bernoulli G has probability distribution Pr(G=1)=p, Pr(G=0)=1-p

• Then E(G) = 1xp + 0x(1-p) = p

Copyright © 2011 Pearson Addison-Wesley. All rights reserved.

• That is, its mean is the probability of outcome 1 (whatever it signifies)

1-8

Page 9: Chapter 2 - Sergio Turner · ... Single: Expected value, mean, variance, standard deviation ... covariance, sums of rvs d) Key ... each weighted by its probability

Formulas for expected value

• Discrete rv: If Y can take k outcome values y1, …, yk with probabilities p1, …, pk respectively, then:

• E(Y) = y1· p1 + … yk· pk = ∑i yi· pi

• Continuous rv: If Y has a pdf, with values ranging from L to

Copyright © 2011 Pearson Addison-Wesley. All rights reserved.

• Continuous rv: If Y has a pdf, with values ranging from L to H, then:

• E(Y) = ∫[L,H] y ·pdf(y) … (just fyi)

• Note: If Y, Z are rvs, then E(Y+Z) = E(Y) +E(Z)

• Note: If c is a constant, then E(cY) = cE(Y)

1-9

Page 10: Chapter 2 - Sergio Turner · ... Single: Expected value, mean, variance, standard deviation ... covariance, sums of rvs d) Key ... each weighted by its probability

Standard deviation and variance

• These measure the spread of a rv

• Variance of Y var(Y) := E[(Y-µY)2], the expected squared

deviation from its mean. Also denoted σ2Y

• Formula: var(Y) = ∑i (yi - µY)2 pi

Copyright © 2011 Pearson Addison-Wesley. All rights reserved.

• Formula: var(Y) = ∑i (yi - µY)2 pi

• Expanding square: var(Y) = E(Y2) - µY2

• Note: If c a constant, var(cY) = c2var(Y), var(c+Y)=var(Y)

• This involving the square, it is not comparable to Y

• Standard deviation of Y σY:= square root of var(Y)

• Var(M) = .6475, stdev(M) ~ .8

1-10

Page 11: Chapter 2 - Sergio Turner · ... Single: Expected value, mean, variance, standard deviation ... covariance, sums of rvs d) Key ... each weighted by its probability

Variance of Bernoulli rv

• Say Bernoulli G has probability distribution Pr(G=1)=p, Pr(G=0)=1-p

• Recall E(G) = p

Copyright © 2011 Pearson Addison-Wesley. All rights reserved.

• So var(G) = (0-p)2(1-p) + (1-p)2p = p(1-p)

1-11

Page 12: Chapter 2 - Sergio Turner · ... Single: Expected value, mean, variance, standard deviation ... covariance, sums of rvs d) Key ... each weighted by its probability

Mean & Variance of linear function

• Say X is a rv, and Y a linear function of it: Y = a + bX

• Then Y is a rv in its own right

• Its mean E(Y) = E(a + bX) = E(a) + E(bX)

Copyright © 2011 Pearson Addison-Wesley. All rights reserved.

• Its mean E(Y) = E(a + bX) = E(a) + E(bX)

= a + bE(X) … in short, µY = a+bµX

• Recall if c a constant: var(cX) = c2var(X) and var(c+X) = var(X), so …

• Var(Y) = var(a+bX) = var(bX) = b2var(X)

• σY = |b|σX upon taking square roots on both sides

1-12

Page 13: Chapter 2 - Sergio Turner · ... Single: Expected value, mean, variance, standard deviation ... covariance, sums of rvs d) Key ... each weighted by its probability

Measures of symmetry and tails

Skewness(Y) =

= measure of asymmetry of a distribution

• skewness = 0: distribution is symmetric

• skewness > (<) 0: distribution has fatter right (left) tail

E Y − µY( )

3

σY

3

− µ( )4

Copyright © 2011 Pearson Addison-Wesley. All rights reserved.

Kurtosis(Y) =

= measure of mass in tails

= measure of probability of large values

• kurtosis = 3: normal distribution

• skewness > 3: heavy tails (“leptokurtotic”)

• Skew.(cY)=Skew.(Y), Kurt.(cY)=Kurt.(Y) “scale-invariant”

1-13

E Y − µY( )

4

σY

4

Page 14: Chapter 2 - Sergio Turner · ... Single: Expected value, mean, variance, standard deviation ... covariance, sums of rvs d) Key ... each weighted by its probability

Copyright © 2011 Pearson Addison-Wesley. All rights reserved. 1-14

Page 15: Chapter 2 - Sergio Turner · ... Single: Expected value, mean, variance, standard deviation ... covariance, sums of rvs d) Key ... each weighted by its probability

Two random variables: joint distributions and covariance

• The joint distribution of two random variables (say X and Y) is the probability/pdf of (X,Y) = (x,y) taking particular values, jointly.

• Say X = 0 (raining), 1 (not) & Y = 0 (long commute), 1 (not)

• Four outcomes for (X,Y): (0,0), (0,1), (1,0), (1,1)

Copyright © 2011 Pearson Addison-Wesley. All rights reserved. 1-15

Y↓\X-> Rain X=0 Dry X=1 total

Long Y=0 .15 .07 .22

Short Y=1 .15 .63 .78

total .3 .7

Page 16: Chapter 2 - Sergio Turner · ... Single: Expected value, mean, variance, standard deviation ... covariance, sums of rvs d) Key ... each weighted by its probability

Two rvs: marginal dist’bn

• The marginal probability distribution of rv Y is its probability distribution, as X is free to take any value

• That is, Pr(Y=y) := ∑i Pr(X=xi, Y=y)

Copyright © 2011 Pearson Addison-Wesley. All rights reserved.

• E.g. Pr(long commute) = .22 & Pr(rain) = .3

• Useful to compute expectations, variances, etc of Y:

• E(Y) = ∑i yi· pi = 0·Pr(Y=0) + 1·Pr(Y=1) = Pr(Y=1) = .78

1-16

Page 17: Chapter 2 - Sergio Turner · ... Single: Expected value, mean, variance, standard deviation ... covariance, sums of rvs d) Key ... each weighted by its probability

Two rvs: conditional dist’bn

• The distribution of rv Y conditional on rv X taking a specific value is called the conditional distribution of Y given X .

• Written Pr(Y=y|X=x)

Copyright © 2011 Pearson Addison-Wesley. All rights reserved.

• E.g. Pr(Y=0|X=0) = .5 (equally likely)

• Bayes’ formula: Pr(Y=y|X=x) = Pr(Y=y,X=x)/Pr(X=x)

• Indeed, Pr(Y=0,X=0)/Pr(X=0) = .15/.30 = .5

Note, the denominator uses the marginal dist’bn of X

1-17

Page 18: Chapter 2 - Sergio Turner · ... Single: Expected value, mean, variance, standard deviation ... covariance, sums of rvs d) Key ... each weighted by its probability

Two rvs: conditional expectation

• The conditional expectation/mean of Y given X , E(Y|X=x), is the mean of Pr(Y|X=x)

• Discrete: E(Y|X=x):= ∑i Pr(Y=yi|X=x)·yi

• E(Y|X=1) = Pr(Y=0|X=1)·0+Pr(Y=1|X=1)·1 = .63/.7 = .9

• E(Y|X=0) = Pr(Y=0|X=0)·0+Pr(Y=1|X=0)·1 = .5·

Copyright © 2011 Pearson Addison-Wesley. All rights reserved.

• E(Y|X=0) = Pr(Y=0|X=0)·0+Pr(Y=1|X=0)·1 = .5

• Law of iterated expectations: The mean of Y is the weighted average of E(Y|X=xi), with weights given by the probability distribution of X = x1, …, xk.

• That is, E(Y) = ∑i E(Y|X=xi)·Pr(X=xi)

• Compactly, E(Y) = E[E(Y|X)] E.g. E(Y) = .9·7 + .5·.3 = .78

1-18

·

Page 19: Chapter 2 - Sergio Turner · ... Single: Expected value, mean, variance, standard deviation ... covariance, sums of rvs d) Key ... each weighted by its probability

Two rvs: conditional variance

• The conditional variance of Y given X , var(Y|X=x), is the variance of Pr(Y|X=x): E[ Y-E(Y|X=x) ]2

• E(Y|X=x) above is constant & underlying prob. is Pr(Y|X=x)

• Discrete: var(Y|X=x):= ∑i Pr(Y=yi|X=x)·[yi-E(Y|X=x)]2

·

Copyright © 2011 Pearson Addison-Wesley. All rights reserved.

• Discrete: var(Y|X=x):= ∑i Pr(Y=yi|X=x)·[yi-E(Y|X=x)]2

• var(Y|X=1) = Pr(Y=0|X=1)·[0-E(Y|X=1)]2+Pr(Y=1|X=1)·[1-E(Y|X=1)]2 = .1·[0-.9]2+.9·[1-.9]2 = .081 + .009 = .09

1-19

·

Page 20: Chapter 2 - Sergio Turner · ... Single: Expected value, mean, variance, standard deviation ... covariance, sums of rvs d) Key ... each weighted by its probability

Two rvs: independence

• Informally, rvs X, Y are independent if knowing the value of one yields no information about the value of the other.

• Precisely, they are independent if the conditional distribution of Y given X equals the marginal distribution of Y.

·

Copyright © 2011 Pearson Addison-Wesley. All rights reserved.

• That is, if Pr(Y=y|X=x) = Pr(Y=y) for all possible x

• Using Bayes’ formula: Pr(Y=y,X=x) = Pr(Y=y)·Pr(X=x)

1-20

·

Page 21: Chapter 2 - Sergio Turner · ... Single: Expected value, mean, variance, standard deviation ... covariance, sums of rvs d) Key ... each weighted by its probability

Aside: Standardizing a rv

• A common transformation of a rv is standardizing it:

• X into X:=(X-µx)/σx

• Deviations from mean, divided by standard deviation

• E(X)=0 and var(X)=1.

Copyright © 2011 Pearson Addison-Wesley. All rights reserved.

• E(X)=0 and var(X)=1.

• Thus standardized rvs always have mean 0 and st.dev 1

• Exercise: If c>0 is a constant, then cX = X

• Thus we say this transformation is scale-invariant. If X is measuring time, whether in seconds, minutes or hours, the transformation gives the same result.

1-21

Page 22: Chapter 2 - Sergio Turner · ... Single: Expected value, mean, variance, standard deviation ... covariance, sums of rvs d) Key ... each weighted by its probability

Two rvs: covariance

• A measure of how much two rvs X, Y vary together is this:

• Covariance between X and Y cov(X,Y):= E[(X-µx)(Y-µY)]

• It is also denoted σXY

• Expanding, cov(X,Y) = E(XY) – µxµY

• Note, (X-µ ) & (Y-µ ) are deviations from their means. ·

Copyright © 2011 Pearson Addison-Wesley. All rights reserved.

• Note, (X-µx) & (Y-µY) are deviations from their means.

• Suppose when X tends to exceed its mean, so does Y tend to exceed its mean. Then the product is positive, and so is the covariance. Likewise, a negative covariance suggests that when X overperforms, Y underperforms, relative to means.

• Discrete: cov(X,Y) = ∑i ∑j (xi-µx)(yj-µY)·Pr(X=xi,Y=yj)

• Exercise: If X, Y are independent, cov = 0 (converse false)

• Exercise: cov(aX,Y)=acov(X,Y). Also, cov(a+X,Y)=cov(X,Y)

1-22

·

Page 23: Chapter 2 - Sergio Turner · ... Single: Expected value, mean, variance, standard deviation ... covariance, sums of rvs d) Key ... each weighted by its probability

Two rvs: correlation

• Covariance E[(X-µx)(Y-µY)] involves variables in potentially different scales (eg. X in minutes, Y in hours), so the product makes little sense.

• However, recall that X =(X-µx)/σx and Y =(Y-µY)/σY are scale-invariant, so E(X·Y) makes more sense:

Copyright © 2011 Pearson Addison-Wesley. All rights reserved.

• corr(X,Y):= E(X·Y) = … (bottom last slide) … = cov(X,Y)/σxσY

• Rvs are said to be uncorrelated if cov(X,Y) = 0. Then corr=0.

• Exercise: If E(Y|X) is independent of X (equal to µY), then X,Y are uncorrelated.

• Fact: Corr is always between -1 and +1

1-23

Page 24: Chapter 2 - Sergio Turner · ... Single: Expected value, mean, variance, standard deviation ... covariance, sums of rvs d) Key ... each weighted by its probability

Correlation measures linear association

Copyright © 2011 Pearson Addison-Wesley. All rights reserved. 1-24

Page 25: Chapter 2 - Sergio Turner · ... Single: Expected value, mean, variance, standard deviation ... covariance, sums of rvs d) Key ... each weighted by its probability

Mean and variance of sums of rvs

Some basic consequences of the definitions of E and var:

• E(X+Y) = E(x) + E(Y)

• E(a + bX) = a + bE(X)

• Var(a+bX) = b2var(X)

Copyright © 2011 Pearson Addison-Wesley. All rights reserved.

• Var(X+Y) = Var(X) + Var(Y) + 2Cov(X,Y)

… so iff uncorrelated is it true Var(X+Y) = Var(X) + Var(Y)

• Var(X) = E(X2) - µ2X

• Cov(a + bX,Y) = bCov(X,Y)

• Cov(X,Y) = E(XY) - µX µY• Cov(X,X) = Var(X)

1-25

Page 26: Chapter 2 - Sergio Turner · ... Single: Expected value, mean, variance, standard deviation ... covariance, sums of rvs d) Key ... each weighted by its probability

Key distributions: Normal

• The normal distribution with mean µ and variance σ2>0, denoted N(µ,σ2), is defined by the pdf

−−

2

2

1exp

2

1)(

σ

µ

πσ

yyfY

Copyright © 2011 Pearson Addison-Wesley. All rights reserved.

• The factor preceding exp ensures probabilities sum to 1, ∫f(y)dy=1

• One can show that E(y)=µ, var(y)= σ2, skew. = 0, kurt. = 3

• Standard normal dist’n: Z=N(0,1), i.e. zero mean & unit var.

• Its cdf is denoted Φ, so Pr(y≤c)= Φ(c)

• Table of values of Φ in p.749-750.

• Table is relevant also for any normal N(µ,σ2), standardize it…

1-26

22 σπσ

Page 27: Chapter 2 - Sergio Turner · ... Single: Expected value, mean, variance, standard deviation ... covariance, sums of rvs d) Key ... each weighted by its probability

Key distributions: Normal

• Say Y is normal, set Z: = (Y-µ)/σ, so Y = µ+ σZ.

• One can show that Z is N(0,1), so that Φ is relevant.

• For example, to look up Pr(Y≤D), note

Φ=−

≤=−

≤−

=≤µµµµ DD

ZDY

DY )Pr()Pr()Pr(

Copyright © 2011 Pearson Addison-Wesley. All rights reserved.

• Which again you can look up on p.750, given D,µ,σ

• Since this is a cdf, Pr(Z>K) = 1-Pr(Z≤K) = 1-Φ(K)

• Also, to look up Pr(A<Z≤B), note

1-27

Φ=≤=≤=≤

σσσσZDY )Pr()Pr()Pr(

( ) ( ) ( ) ( ) ( )[ ] ( ) ( )ABBA ΦΦ=ΦΦ=>==≤< --1--1BZPr-AZPr-1BZAPr

Page 28: Chapter 2 - Sergio Turner · ... Single: Expected value, mean, variance, standard deviation ... covariance, sums of rvs d) Key ... each weighted by its probability

Key distributions: normal

• An important feature is that normal dist’bns are closed under sums and scalings. That is, if X,Y are normals, and a,b are constants, then aX+bY also is normal.

• The mean of aX+bY we already know, from our work on expectations: Its mean is aµ +bµ

Copyright © 2011 Pearson Addison-Wesley. All rights reserved.

expectations: Its mean is aµX+bµY

• Its variance we also know, from before:

• Fact: If two normals are uncorrelated, they are independent

• Recall converse for any rv, if independent, then uncorrelated

1-28

( ) ( ) ( ) ( )YXabYbXabYaX ,cov2varvarvar22

++=+

Page 29: Chapter 2 - Sergio Turner · ... Single: Expected value, mean, variance, standard deviation ... covariance, sums of rvs d) Key ... each weighted by its probability

Key distributions: normal

• Fact: If a set of rvs has a multivariate normal disb’n, then the marginal dist’bn of each is normal

• Fact: If X,Y have a bivariate normal dit’bn, then E(Y|X=x) is linear in x, i.e. E(Y|X=x)=a+bx for all x, and some constants a,b.

Copyright © 2011 Pearson Addison-Wesley. All rights reserved.

constants a,b.

1-29

Page 30: Chapter 2 - Sergio Turner · ... Single: Expected value, mean, variance, standard deviation ... covariance, sums of rvs d) Key ... each weighted by its probability

Key distributions: Chi-squared

• The chi-squared distribution with m degrees of freedom is the dist’bn of a sum of m squared independent standard normal dist’bns. Denoted χm

2

• So if X,Y are standard normals, then X2 + Y2 is a chi-squared with df=2.

Copyright © 2011 Pearson Addison-Wesley. All rights reserved.

squared with df=2.

• Table on p.752 gives some values, given the percentile.

• We see the 95th percentile for a χ22 is 5.99

• The chi-squared will appear when we do tests. If we wish to test that a certain error term is statistically insignificant, and know that it has such a dist’bn, then the table will help us.

1-30

Page 31: Chapter 2 - Sergio Turner · ... Single: Expected value, mean, variance, standard deviation ... covariance, sums of rvs d) Key ... each weighted by its probability

Key distributions: Student t

• The Student t distribution with m degrees of freedom is the dist’bn of the ratio Z/(χm

2 /m)1/2, the ratio of a standard normal over the square root of a chi-squared with df=m divided by m, where these are independent.

• It has the same shape as a normal, except for fatter tails,

Copyright © 2011 Pearson Addison-Wesley. All rights reserved.

• It has the same shape as a normal, except for fatter tails, which thin out the larger is m.

• A table with percentiles for the t dist’bn is on p. 751

1-31

Page 32: Chapter 2 - Sergio Turner · ... Single: Expected value, mean, variance, standard deviation ... covariance, sums of rvs d) Key ... each weighted by its probability

Key distributions: F

• The F distribution with m,n degrees of freedom Fm,n is the dist’bn of the ratio (W/m)/(V/n) where W,V are independent chi-squared dist’bns with df=m,n respectively.

• Z/(χm2 /m)1/2, the ratio of a standard normal over the square

root of a chi-squared with df=m divided by m.

Copyright © 2011 Pearson Addison-Wesley. All rights reserved.

root of a chi-squared with df=m divided by m.

• A related dist’bn is the Fm,∞ = W/m, where W is a χm2

• When n is large, this is a good approximation.

• Tables on pp.753-6 give values of these F’s at various percentiles and given various df’s.

1-32