69
Lecture 2 Today: • Statistical Review cont’d: • Expected value, variance and covariance rules • Sampling and estimators • Unbiasedness and efficiency • Sample equivalents of variance, covariance and correlation

Lecture 2 Today: Statistical Review cont’d: Expected value, variance and covariance rules Sampling and estimators Unbiasedness and efficiency Sample equivalents

Embed Size (px)

Citation preview

Lecture 2

Today:• Statistical Review cont’d:

• Expected value, variance and covariance rules• Sampling and estimators• Unbiasedness and efficiency• Sample equivalents of variance, covariance and correlation

Last time:

Properties of discrete random variables:

Expected value:

Variance:

n

iiinnX pxpxpxXE

111 ...)(

i

n

iXinXnX

X

pxpxpx

XE

1

221

2

22

)()(...)(

)(

Last time:

Properties of continuous random variables:

Expected value:

where f(X) is the probability density function

Variance:

dXXXfXEX )()(

dXXfXXE XXX )()()( 222

© Christopher Dougherty 1999–2006

1. E(X + Y) = E(X) + E(Y)

EXPECTED VALUE RULES

This sequence states the rules for manipulating expected values. First, the additive rule. The expected value of the sum of two random variables is the sum of their expected values.

© Christopher Dougherty 1999–2006

1. E(X + Y) = E(X) + E(Y)

Example generalization:

E(W + X + Y + Z) = E(W) + E(X) + E(Y) + E(Z)

This generalizes to any number of variables. An example is shown.

EXPECTED VALUE RULES

© Christopher Dougherty 1999–2006

1. E(X + Y) = E(X) + E(Y)

2. E(bX) = bE(X)

EXPECTED VALUE RULES

The second rule is the multiplicative rule. The expected value of (a variable multiplied by a constant) is equal to the constant multiplied by the expected value of the variable.

© Christopher Dougherty 1999–2006

1. E(X + Y) = E(X) + E(Y)

2. E(bX) = bE(X)

Example:

E(3X) = 3E(X)

EXPECTED VALUE RULES

For example, the expected value of 3X is three times the expected value of X.

© Christopher Dougherty 1999–2006

1. E(X + Y) = E(X) + E(Y)

2. E(bX) = bE(X)

3. E(b) = b

EXPECTED VALUE RULES

Finally, the expected value of a constant is just the constant. Of course this is obvious.

© Christopher Dougherty 1999–2006

1. E(X + Y) = E(X) + E(Y)

2. E(bX) = bE(X)

3. E(b) = b

Y = b1 + b2X

E(Y) = E(b1 + b2X)

EXPECTED VALUE RULES

As an exercise, we will use the rules to simplify the expected value of an expression. Suppose that we are interested in the expected value of a variable Y, where Y = b1 + b2X.

© Christopher Dougherty 1999–2006

1. E(X + Y) = E(X) + E(Y)

2. E(bX) = bE(X)

3. E(b) = b

Y = b1 + b2X

E(Y) = E(b1 + b2X)

= E(b1) + E(b2X)

EXPECTED VALUE RULES

We use the first rule to break up the expected value into its two components.

© Christopher Dougherty 1999–2006

1. E(X + Y) = E(X) + E(Y)

2. E(bX) = bE(X)

3. E(b) = b

Y = b1 + b2X

E(Y) = E(b1 + b2X)

= E(b1) + E(b2X)

= b1 + b2E(X)

Then we use the second rule to replace E(b2X) by b2E(X) and the third rule to simplify E(b1) to just b1. This is as far as we can go in this example.

EXPECTED VALUE RULES

© Christopher Dougherty 1999–2006

ALTERNATIVE EXPRESSION FOR POPULATION VARIANCE

This sequence derives an alternative expression for the population variance of a random variable. It provides an opportunity for practising the use of the expected value rules.

2X

2X = E(X2) – m2

= E[(X – m)2]

= E(X2 – 2mX + m2)

= E(X2) + E(–2mX) + E(m2)

= E(X2) – 2mE(X) + m2

= E(X2) – 2m2 + m2 = E(X2) – m2

© Christopher Dougherty 1999–2006

ALTERNATIVE EXPRESSION FOR POPULATION VARIANCE

We start with the definition of the population variance of X.

2X

2X = E(X2) – m2

= E[(X – m)2]

= E(X2 – 2mX + m2)

= E(X2) + E(–2mX) + E(m2)

= E(X2) – 2mE(X) + m2

= E(X2) – 2m2 + m2 = E(X2) – m2

© Christopher Dougherty 1999–2006

ALTERNATIVE EXPRESSION FOR POPULATION VARIANCE

We expand the quadratic.

2X

2X = E(X2) – m2

= E[(X – m)2]

= E(X2 – 2mX + m2)

= E(X2) + E(–2mX) + E(m2)

= E(X2) – 2mE(X) + m2

= E(X2) – 2m2 + m2 = E(X2) – m2

© Christopher Dougherty 1999–2006

ALTERNATIVE EXPRESSION FOR POPULATION VARIANCE

Now the first expected value rule is used to decompose the expression into three separate expected values.

2X

2X = E(X2) – m2

= E[(X – m)2]

= E(X2 – 2mX + m2)

= E(X2) + E(–2mX) + E(m2)

= E(X2) – 2mE(X) + m2

= E(X2) – 2m2 + m2 = E(X2) – m2

© Christopher Dougherty 1999–2006

ALTERNATIVE EXPRESSION FOR POPULATION VARIANCE

The second expected value rule is used to simplify the middle term and the third rule is used to simplify the last one.

2X

2X = E(X2) – m2

= E[(X – m)2]

= E(X2 – 2mX + m2)

= E(X2) + E(–2mX) + E(m2)

= E(X2) – 2mE(X) + m2

= E(X2) – 2m2 + m2 = E(X2) – m2

© Christopher Dougherty 1999–2006

ALTERNATIVE EXPRESSION FOR POPULATION VARIANCE

The middle term is rewritten, using the fact that E(X) and mX are just different ways of writing the population mean of X.

2X

2X = E(X2) – m2

= E[(X – m)2]

= E(X2 – 2mX + m2)

= E(X2) + E(–2mX) + E(m2)

= E(X2) – 2mE(X) + m2

= E(X2) – 2m2 + m2 = E(X2) – m2

© Christopher Dougherty 1999–2006

= E(X2) – m2

= E[(X – m)2]

= E(X2 – 2mX + m2)

= E(X2) + E(–2mX) + E(m2)

= E(X2) – 2mE(X) + m2

= E(X2) – 2m2 + m2 = E(X2) – m2

Hence we get the result.

ALTERNATIVE EXPRESSION FOR POPULATION VARIANCE

2X

2X

© Christopher Dougherty 1999–2006

THE FIXED AND RANDOM COMPONENTS OF A RANDOM VARIABLE

In this short sequence we shall decompose a random variable X into its fixed and random components. Let the population mean of X be mX.

Population mean of X: E(X) =mX

In observation i, the randomcomponent is given by ui = xi – mX

Hence xi can be decomposedinto fixed and random components: xi = mX + ui

Note that the expected valueof ui is zero:

E(ui) = E(xi – mX) = E(xi) + E(–mX) =mX – mX = 0

© Christopher Dougherty 1999–2006

THE FIXED AND RANDOM COMPONENTS OF A RANDOM VARIABLE

The actual value of X in any observation will in general be different from mX. We will call the difference ui, so ui = xi - mX.

Population mean of X: E(X) =mX

In observation i, the randomcomponent is given by ui = xi – mX

Hence xi can be decomposedinto fixed and random components: xi = mX + ui

Note that the expected valueof ui is zero:

E(ui) = E(xi – mX) = E(xi) + E(–mX) =mX – mX = 0

© Christopher Dougherty 1999–2006

THE FIXED AND RANDOM COMPONENTS OF A RANDOM VARIABLE

Re-arranging this equation, we can write xi as the sum of its fixed component, mX, which is the same for all observations, and its random component, ui.

Population mean of X: E(X) =mX

In observation i, the randomcomponent is given by ui = xi – mX

Hence xi can be decomposedinto fixed and random components: xi = mX + ui

Note that the expected valueof ui is zero:

E(ui) = E(xi – mX) = E(xi) + E(–mX) =mX – mX = 0

© Christopher Dougherty 1999–2006

Population mean of X: E(X) =mX

In observation i, the randomcomponent is given by ui = xi – mX

Hence xi can be decomposedinto fixed and random components: xi = mX + ui

Note that the expected valueof ui is zero:

E(ui) = E(xi – mX) = E(xi) + E(–mX) =mX – mX = 0

The expected value of the random component is zero. It does not systematically tend to increase or decrease X. It just makes it deviate from its population mean.

THE FIXED AND RANDOM COMPONENTS OF A RANDOM VARIABLE

© Christopher Dougherty 1999–2006

Two random variables X and Y are said to beindependent if and only if

E[f(X)g(Y)] = E[f(X)] E[g(Y)]

for any functions f(X) and g(Y).

INDEPENDENCE OF TWO RANDOM VARIABLES

Two variables X and Y are independent if and only if, given any functions f(X) and g(Y), the expected value of the product f(X)g(Y) is equal to the expected value of f(X) multiplied by the expected value of g(Y).

© Christopher Dougherty 1999–2006

Two random variables X and Y are said to beindependent if and only if

E[f(X)g(Y)] = E[f(X)] E[g(Y)]

for any functions f(X) and g(Y).

Special case: if X and Y are independent,

E(XY) = E(X) E(Y)

INDEPENDENCE OF TWO RANDOM VARIABLES

As a special case, the expected value of XY is equal to the expected value of X multiplied by the expected value of Y if and only if X and Y are independent.

© Christopher Dougherty 1999–2006

Covariance

The covariance of two random variables X and Y, often written sXY, is defined to be the expected value of the product of their deviations from their population means.

))(( ),cov( YXXY YXEYX

COVARIANCE, COVARIANCE AND VARIANCE RULES, AND CORRELATION

© Christopher Dougherty 1999–2006

000

)()()()(

)()())((

YYXX

YX

YXYX

EYEEXE

YEXEYXE

If two variables are independent, their covariance is zero.

To show this, start by rewriting the covariance as the product of the expected values of its factors.

We are allowed to do this because (and only because) X and Y are independent (see the earlier sequence on independence.

Covariance

))(( ),cov( YXXY YXEYX

COVARIANCE, COVARIANCE AND VARIANCE RULES, AND CORRELATION

© Christopher Dougherty 1999–2006

000

)()()()(

)()())((

YYXX

YX

YXYX

EYEEXE

YEXEYXE

))(( ),cov( YXXY YXEYX

The expected values of both factors are zero because E(X) = mX and E(Y) = mY. E(mX) = mX and E(mY) = mY because mX and mY are constants. Thus the covariance is zero.

Covariance

COVARIANCE, COVARIANCE AND VARIANCE RULES, AND CORRELATION

© Christopher Dougherty 1999–2006

There are some rules that follow in a perfectly straightforward way from the definition of covariance, and since they are going to be used frequently in later chapters it is worthwhile establishing them immediately. First, the addition rule.

Covariance rules

1. If Y = V + W,cov(X, Y) = cov(X, V) + cov(X,W).

2. If Y = bZ, where b is a constantcov(X, Y) = bcov(X, Z)

3. If Y = b, where b is a constant,cov(X, Y) = 0

COVARIANCE, COVARIANCE AND VARIANCE RULES, AND CORRELATION

© Christopher Dougherty 1999–2006

Next, the multiplication rule, for cases where a variable is multiplied by a constant.

Covariance rules

1. If Y = V + W,cov(X, Y) = cov(X, V) + cov(X,W).

2. If Y = bZ, where b is a constantcov(X, Y) = bcov(X, Z)

3. If Y = b, where b is a constant,cov(X, Y) = 0

COVARIANCE, COVARIANCE AND VARIANCE RULES, AND CORRELATION

© Christopher Dougherty 1999–2006 7

Finally, a primitive rule that is often useful.

Covariance rules

1. If Y = V + W,cov(X, Y) = cov(X, V) + cov(X,W).

2. If Y = bZ, where b is a constantcov(X, Y) = bcov(X, Z)

3. If Y = b, where b is a constant,cov(X, Y) = 0

COVARIANCE, COVARIANCE AND VARIANCE RULES, AND CORRELATION

© Christopher Dougherty 1999–2006

The proofs of the rules are straightforward. In each case the proof starts with the definition of cov(X, Y).

Covariance rules

1. If Y = V + W,cov(X, Y) = cov(X, V) + cov(X,W).

Proof:

Since Y = V + W, mY = mV + mW

).,(cov),(cov

][][

),(cov

WXVX

WXVXE

WVXE

YXEYX

WXVX

WVX

YX

COVARIANCE, COVARIANCE AND VARIANCE RULES, AND CORRELATION

© Christopher Dougherty 1999–2006

COVARIANCE, COVARIANCE AND VARIANCE RULES, AND CORRELATION

We now substitute for Y and re-arrange.

Covariance rules

1. If Y = V + W,cov(X, Y) = cov(X, V) + cov(X,W).

Proof:

Since Y = V + W, mY = mV + mW

).,(cov),(cov

][][

),(cov

WXVX

WXVXE

WVXE

YXEYX

WXVX

WVX

YX

© Christopher Dougherty 1999–2006

COVARIANCE, COVARIANCE AND VARIANCE RULES, AND CORRELATION

This gives us the result.

Covariance rules

1. If Y = V + W,cov(X, Y) = cov(X, V) + cov(X,W).

Proof:

Since Y = V + W, mY = mV + mW

).,(cov),(cov

][][

),(cov

WXVX

WXVXE

WVXE

YXEYX

WXVX

WVX

YX

© Christopher Dougherty 1999–2006

COVARIANCE, COVARIANCE AND VARIANCE RULES, AND CORRELATION

Next, the multiplication rule, for cases where a variable is multiplied by a constant. The Y terms have been replaced by the corresponding bZ terms.

Covariance rules

2. If Y = bZ,cov(X, Y) = bcov(X, Z).

Proof:

Since Y = bZ, mY = bmZ

).,(cov

))((

))((

))((),(cov

ZXb

ZXbE

bbZXE

YXEYX

ZX

ZX

YX

© Christopher Dougherty 1999–2006

COVARIANCE, COVARIANCE AND VARIANCE RULES, AND CORRELATION

b is a common factor and can be taken out of the expression, giving us the result that we want.

Covariance rules

2. If Y = bZ,cov(X, Y) = bcov(X, Z).

Proof:

Since Y = bZ, mY = bmZ

).,(cov

))((

))((

))((),(cov

ZXb

ZXbE

bbZXE

YXEYX

ZX

ZX

YX

© Christopher Dougherty 1999–2006

COVARIANCE, COVARIANCE AND VARIANCE RULES, AND CORRELATION

The proof of the third rule is trivial.

Covariance rules

3. If Y = b,cov(X, Y) = 0.

Proof:

Since Y = b, mY = b

.0

0

))((

))((),(cov

E

bbXE

YXEYX

X

YX

© Christopher Dougherty 1999–2006

COVARIANCE, COVARIANCE AND VARIANCE RULES, AND CORRELATION

Corresponding to the covariance rules, there are parallel rules for variances. First the addition rule.

Variance rules

1. If Y = V + W,var(Y) = var(V) + var(W) + 2cov(V, W).

2. If Y = bZ, where b is a constant,var(Y) = b2var(Z).

3. If Y = b, where b is a constant,var(Y) = 0.

4. If Y = V + b, where b is a constant,var(Y) = var(V).

© Christopher Dougherty 1999–2006

COVARIANCE, COVARIANCE AND VARIANCE RULES, AND CORRELATION

Next, the multiplication rule, for cases where a variable is multiplied by a constant.

Variance rules

1. If Y = V + W,var(Y) = var(V) + var(W) + 2cov(V, W).

2. If Y = bZ, where b is a constant,var(Y) = b2var(Z).

3. If Y = b, where b is a constant,var(Y) = 0.

4. If Y = V + b, where b is a constant,var(Y) = var(V).

© Christopher Dougherty 1999–2006

COVARIANCE, COVARIANCE AND VARIANCE RULES, AND CORRELATION

A third rule to cover the special case where Y is a constant.

Variance rules

1. If Y = V + W,var(Y) = var(V) + var(W) + 2cov(V, W).

2. If Y = bZ, where b is a constant,var(Y) = b2var(Z).

3. If Y = b, where b is a constant,var(Y) = 0.

4. If Y = V + b, where b is a constant,var(Y) = var(V).

© Christopher Dougherty 1999–2006

Finally, it is useful to state a fourth rule. It depends on the first three, but it is so often of practical value that it is worth keeping it in mind separately.

Variance rules

1. If Y = V + W,var(Y) = var(V) + var(W) + 2cov(V, W).

2. If Y = bZ, where b is a constant,var(Y) = b2var(Z).

3. If Y = b, where b is a constant,var(Y) = 0.

4. If Y = V + b, where b is a constant,var(Y) = var(V).

COVARIANCE, COVARIANCE AND VARIANCE RULES, AND CORRELATION

© Christopher Dougherty 1999–2006

The proofs of these rules can be derived from the results for covariances, noting that the variance of Y is equivalent to the covariance of Y with itself.

Variance rules

1. If Y = V + W,var(Y) = var(V) + var(W) + 2cov(V, W).

Proof:

var(Y) = cov(Y, Y) = cov([V + W], Y)

= cov(V, Y) + cov(W, Y)

= cov(V, [V + W]) + cov(W, [V + W])

= cov(V, V) + cov(V,W) + cov(W, V) + cov(W, W)

= var(V) + 2cov(V, W) + var(W)

).,(cov

))((

)()(var 2

XX

XXE

XEX

XX

X

COVARIANCE, COVARIANCE AND VARIANCE RULES, AND CORRELATION

© Christopher Dougherty 1999–2006

We start by replacing one of the Y arguments by V + W.

Variance rules

1. If Y = V + W,var(Y) = var(V) + var(W) + 2cov(V, W).

Proof:

var(Y) = cov(Y, Y) = cov([V + W], Y)

= cov(V, Y) + cov(W, Y)

= cov(V, [V + W]) + cov(W, [V + W])

= cov(V, V) + cov(V,W) + cov(W, V) + cov(W, W)

= var(V) + 2cov(V, W) + var(W)

COVARIANCE, COVARIANCE AND VARIANCE RULES, AND CORRELATION

© Christopher Dougherty 1999–2006

We then use covariance rule 1.

Variance rules

1. If Y = V + W,var(Y) = var(V) + var(W) + 2cov(V, W).

Proof:

var(Y) = cov(Y, Y) = cov([V + W], Y)

= cov(V, Y) + cov(W, Y)

= cov(V, [V + W]) + cov(W, [V + W])

= cov(V, V) + cov(V,W) + cov(W, V) + cov(W, W)

= var(V) + 2cov(V, W) + var(W)

COVARIANCE, COVARIANCE AND VARIANCE RULES, AND CORRELATION

© Christopher Dougherty 1999–2006

We now substitute for the other Y argument in both terms and use covariance rule 1 a second time.

Variance rules

1. If Y = V + W,var(Y) = var(V) + var(W) + 2cov(V, W).

Proof:

var(Y) = cov(Y, Y) = cov([V + W], Y)

= cov(V, Y) + cov(W, Y)

= cov(V, [V + W]) + cov(W, [V + W])

= cov(V, V) + cov(V,W) + cov(W, V) + cov(W, W)

= var(V) + 2cov(V, W) + var(W)

COVARIANCE, COVARIANCE AND VARIANCE RULES, AND CORRELATION

© Christopher Dougherty 1999–2006

This gives us the result. Note that the order of the arguments does not affect a covariance expression and hence cov(W, V) is the same as cov(V, W).

Variance rules

1. If Y = V + W,var(Y) = var(V) + var(W) + 2cov(V, W).

Proof:

var(Y) = cov(Y, Y) = cov([V + W], Y)

= cov(V, Y) + cov(W, Y)

= cov(V, [V + W]) + cov(W, [V + W])

= cov(V, V) + cov(V,W) + cov(W, V) + cov(W, W)

= var(V) + 2cov(V, W) + var(W)

COVARIANCE, COVARIANCE AND VARIANCE RULES, AND CORRELATION

© Christopher Dougherty 1999–2006

The proof of the variance rule 2 is even more straightforward. We start by writing var(Y) as cov(Y, Y). We then substitute for both of the iYi arguments and take the b terms outside as common factors.

Variance rules

2. If Y = bZ, where b is a constant,var(Y) = b2var(Z).

Proof:

var(Y) = cov(Y, Y) = cov(bZ, bZ)

= b2cov(Z, Z)

= b2var(Z).

COVARIANCE, COVARIANCE AND VARIANCE RULES, AND CORRELATION

© Christopher Dougherty 1999–2006

The third rule is trivial. We make use of covariance rule 3. Obviously if a variable is constant, it has zero variance.

Variance rules

3. If Y = b, where b is a constant,var(Y) = 0.

Proof:

var(Y) = cov(b, b) = 0.

COVARIANCE, COVARIANCE AND VARIANCE RULES, AND CORRELATION

© Christopher Dougherty 1999–2006

The fourth variance rule starts by using the first. The second term on the right side is zero by covariance rule 3. The third is also zero by variance rule 3.

Variance rules

4. If Y = V + b, where b is a constant,var(Y) = var(V).

Proof:

var(Y) = var(V) + 2cov(V, b) + var(b)

= var(V)

COVARIANCE, COVARIANCE AND VARIANCE RULES, AND CORRELATION

© Christopher Dougherty 1999–2006

The intuitive reason for this result is easy to understand. If you add a constant to a variable, you shift its entire distribution by that constant. The expected value of the squared deviation from the mean is unaffected.

Variance rules

4. If Y = V + b, where b is a constant,var(Y) = var(V).

Proof:

var(Y) = var(V) + 2cov(V, b) + var(b)

= var(V)

0

0 V + b

VmV

mV + b

COVARIANCE, COVARIANCE AND VARIANCE RULES, AND CORRELATION

© Christopher Dougherty 1999–2006

cov(X, Y) is unsatisfactory as a measure of association between two variables X and Y because it depends on the units of measurement of X and Y.

A better measure of association is the population correlation coefficient because it is dimensionless. The numerator possesses the units of measurement of both X and Y.

The variances of X and Y in the denominator possess the squared units of measurement of those variables.

Correlation

22YX

XYXY

COVARIANCE, COVARIANCE AND VARIANCE RULES, AND CORRELATION

© Christopher Dougherty 1999–2006

However, once the square root has been taken into account, the units of measurement are the same as those of the numerator, and the expression as a whole is unit free.

If X and Y are independent, rXY will be equal to zero because sXY will be zero.

If there is a positive association between them, sXY, and hence rXY, will be positive. If there is an exact positive linear relationship, rXY will assume its maximum value of 1. Similarly, if there is a negative relationship, rXY will be negative, with minimum value of –1.

If X and Y are independent, rXY will be equal to zero because sXY will be zero. If there is a positive association between them, sXY, and hence rXY, will be positive. If there is an exact positive linear relationship, rXY will assume its maximum value of 1. Similarly, if there is a negative relationship, rXY will be negative, with minimum value of –1.

Correlation

22YX

XYXY

COVARIANCE, COVARIANCE AND VARIANCE RULES, AND CORRELATION

© Christopher Dougherty 1999–2006

Suppose we have a random variable X and we wish to estimate its unknown population mean mX.

Planning (beforehand concepts)

Our first step is to take a sample of n observations {X1, …, Xn}.

Before we take the sample, while we are still at the planning stage, the Xi are random quantities. We know that they will be generated randomly from the distribution for X, but we do not know their values in advance.

So now we are thinking about random variables on two levels: the random variable X, and its random sample components.

SAMPLING AND ESTIMATORS

© Christopher Dougherty 1999–2006

Suppose we have a random variable X and we wish to estimate its unknown population mean mX.

Planning (beforehand concepts)

Our first step is to take a sample of n observations {X1, …, Xn}.

Before we take the sample, while we are still at the planning stage, the Xi are random quantities. We know that they will be generated randomly from the distribution for X, but we do not know their values in advance.

So now we are thinking about random variables on two levels: the random variable X, and its random sample components.

SAMPLING AND ESTIMATORS

© Christopher Dougherty 1999–2006

Suppose we have a random variable X and we wish to estimate its unknown population mean mX.

Realization (afterwards concepts)

Once we have taken the sample we will have a set of numbers {x1, …, xn}.

This is called by statisticians a realization. The lower case is to emphasize that these are numbers, not variables.

SAMPLING AND ESTIMATORS

© Christopher Dougherty 1999–2006

Suppose we have a random variable X and we wish to estimate its unknown population mean mX.

Planning (beforehand concepts)

Back to the plan. Having generated a sample of n observations {X1, …, Xn}, we plan to use them with a mathematical formula to estimate the unknown population mean mX.

This formula is known as an estimator. In this context, the standard (but not only) estimator is the sample mean

An estimator is a random variable because it depends on the random quantities {X1, …, Xn}.

nXXn

X ...1

1

SAMPLING AND ESTIMATORS

© Christopher Dougherty 1999–2006

Suppose we have a random variable X and we wish to estimate its unknown population mean mX.

Planning (beforehand concepts)

Back to the plan. Having generated a sample of n observations {X1, …, Xn}, we plan to use them with a mathematical formula to estimate the unknown population mean mX.

This formula is known as an estimator. In this context, the standard (but not only) estimator is the sample mean

An estimator is a random variable because it depends on the random quantities {X1, …, Xn}.

nXXn

X ...1

1

SAMPLING AND ESTIMATORS

© Christopher Dougherty 1999–2006

Suppose we have a random variable X and we wish to estimate its unknown population mean mX.

Realization (afterwards concepts)

The actual number that we obtain, given the realization {x1, …, xn}, is known as our estimate.

SAMPLING AND ESTIMATORS

© Christopher Dougherty 1999–2006

probability density

function of X

mX

XmX

X

probability density

function of X

We will see why these distinctions are useful and important in a comparison of the distributions of X and X. We will start by showing that X has the same mean as X.

SAMPLING AND ESTIMATORS

© Christopher Dougherty 1999–2006

XXX

n

n

n

n

XEXEn

XXEn

XXn

EXE

...1

...1

...1

...1

1

1

1

We start by replacing X by its definition and then using expected value rule 2 to take 1/n out of the expression as a common factor.

SAMPLING AND ESTIMATORS

© Christopher Dougherty 1999–2006

XXX

n

n

n

n

XEXEn

XXEn

XXn

EXE

...1

...1

...1

...1

1

1

1

Next we use expected value rule 1 to replace the expectation of a sum with a sum of expectations.

SAMPLING AND ESTIMATORS

© Christopher Dougherty 1999–2006

XXX

n

n

n

n

XEXEn

XXEn

XXn

EXE

...1

...1

...1

...1

1

1

1

Now we come to the bit that requires thought. Start with X1. When we are still at the planning stage, X1 is a random variable and we do not know what its value will be.

SAMPLING AND ESTIMATORS

© Christopher Dougherty 1999–2006

XXX

n

n

n

n

XEXEn

XXEn

XXn

EXE

...1

...1

...1

...1

1

1

1

All we know is that it will be generated randomly from the distribution of X. The expected value of X1, as a beforehand concept, will therefore be mX. The same is true for all the other sample components, thinking about them beforehand. Hence we write this line.

SAMPLING AND ESTIMATORS

© Christopher Dougherty 1999–2006

Thus we have shown that the mean of the distribution of X is mX.

XXX

n

n

n

n

XEXEn

XXEn

XXn

EXE

...1

...1

...1

...1

1

1

1

SAMPLING AND ESTIMATORS

© Christopher Dougherty 1999–2006

We will next demonstrate that the variance of the distribution of X is smaller than that of X, as depicted in the diagram.

probability density

function of X

mX

XmX

X

probability density

function of X

SAMPLING AND ESTIMATORS

© Christopher Dougherty 1999–2006

.1

...1

var...var1

...var1

...1

var)(

22

2

222

12

12

12

nn

n

n

XXn

XXn

XXn

XVar

XX

XX

n

n

nX

We start by replacing X by its definition and then using variance rule 2 to take 1/n out of the expression as a common factor.

SAMPLING AND ESTIMATORS

© Christopher Dougherty 1999–2006

.1

...1

var...var1

...var1

...1

var

22

2

222

12

12

12

nn

n

n

XXn

XXn

XXn

XX

XX

n

n

nX

Next we use variance rule 1 to replace the variance of a sum with a sum of variances. In principle there are many covariance terms as well, but they are zero if we assume that the sample values are generated independently.

SAMPLING AND ESTIMATORS

© Christopher Dougherty 1999–2006

.1

...1

var...var1

...var1

...1

var

22

2

222

12

12

12

nn

n

n

XXn

XXn

XXn

XX

XX

n

n

nX

Now we come to the bit that requires thought. Start with X1. When we are still at the planning stage, we do not know what the value of X1 will be.

SAMPLING AND ESTIMATORS

© Christopher Dougherty 1999–2006

.1

...1

var...var1

...var1

...1

var

22

2

222

12

12

12

nn

n

n

XXn

XXn

XXn

XX

XX

n

n

nX

All we know is that it will be generated randomly from the distribution of X. The variance of X1, as a beforehand concept, will therefore be sX. The same is true for all the other sample components, thinking about them beforehand. Hence we write this line.

2

SAMPLING AND ESTIMATORS

© Christopher Dougherty 1999–2006

.1

...1

var...var1

...var1

...1

var

22

2

222

12

12

12

nn

n

n

XXn

XXn

XXn

XX

XX

n

n

nX

Thus we have demonstrated that the variance of the sample mean is equal to the variance of X divided by n, a result with which you will be familiar from your statistics course.

SAMPLING AND ESTIMATORS