35

Econ506NotesI-1

Embed Size (px)

Citation preview

Page 1: Econ506NotesI-1
Page 2: Econ506NotesI-1
Page 3: Econ506NotesI-1

UNIVERSITY OF ILLINOISDepartment of Economics

Course: Econ 506, Fall 2012 August 28, 2012Instructor: Anil K. Bera ([email protected]), 225E DKHClass Hours: 1:30 - 3:10 TuThClass Room: 215 DKHOffice Hours: 12:00 - 1:00 TuThTA: Yu-Hsien Kao ([email protected])

Prologue

April 1242Baghdad, Iraq

Baghdad took no note of the arrival of Shams (Sun) of Tabriz, a wondering SufiSaint, from Samarkand to the city’s famous Dervish Lodge. Shams told the master ofthe lodge, Baba Zaman, that he wanted to share his accumulated knowledge to themost competent student. Why? Because, Shams said, “Knowledge is like brackishwater at the bottom of an old vase unless it flows somewhere.” Baba Zaman got seriousand asked a bizarre question:“ You say you are ready to deliver all your knowledgeto another person. You want to hold the Truth in your palm as if it were a preciouspearl and offer it to someone special. That is no small task for a human being. Arenot you asking too much! What are you willing to pay in return?”

Raising an eyebrow, Shams of Tabriz said firmly, “ I am willing to give my head.”

This is an introductory course in mathematical statistics, and its purpose is toprepare you for the econometrics course, Econ 507 (Spring 2013). To carry out agood applied econometrics study, it is necessary to master the econometric theory.Econometric theory requires a good knowledge of statistical theory which in turn hasits foundation on probability theory. Finally, one cannot study probability withoutset theory. Therefore, we will begin at the beginning. We will start with the settheory, and discuss probability and the basic structure for statistics. Then we willslowly move into different probability distributions, asymptotic theory, estimationand hypothesis testing.

After doing all these, the whole course will be just like a candle.“ It will provideus much valuable light. But let us not forget that a candle will help us to go from oneplace to another in the dark. If we, however, forget where we are headed and insteadconcentrate on the candle, what good will it be?”

As you have guessed the course materials will be highly theoretical. No statisticalbackground will be assumed. However, I will take it for granted that you alreadyknow differential and integral calculus and linear algebra. Good Luck!

Course Outline:

1. Introduction

(a) Why statistics?

(b) Statistical data analysis: Life by numbers

2. Probability Theory

(a) Algebra of sets

(b) Random variable

1

Page 4: Econ506NotesI-1

(c) Distribution function of a random variable

(d) Probability mass and density functions

(e) Conditional probability distribution

(f) Bayes theorem and its applications

(g) More on conditional probability distribution

(h) Mathematical expectation

(i) Bivariate moments

(j) Generating functions

(k) Distribution of a function of a random variable

3. Univariate Discrete and Continuous Distributions

(a) The basic distribution–hypergeometric

(b) Binomial distribution (as a limit of hypergeometric)

(c) Poisson distribution (as a limit of binomial)

(d) Normal distribution

(e) Properties of normal distribution

(f) Distributions derived from normal (χ2, t and F )

(g) Distributions of sample mean and variance

4. Asymptotic Theory

(a) Law of large numbers

(b) Central limit theorems

5. Estimation

(a) Properties of an estimator

(b) Cramer-Rao inequality

(c) Sufficiency and minimal sufficiency

(d) Minimum variance unbiased estimator and Rao-Blackwell theorem

(e) Maximum likelihood estimation

(f) Nonparametric method and density estimation

6. Hypothesis Testing

(a) Notion of statistical hypothesis testing

(b) Type I and II errors

(c) Uniformly most powerful test and Neyman-Pearson lemma

(d) Likelihood ratio (LR) test

(e) Examples on hypothesis testing

(f) Rao’s score or the Lagrange multiplier (LM) test

(g) Wald (W) test

2

Page 5: Econ506NotesI-1

Recommended Text:

A First Course in Probability and Statistics by B.L.S. Prakasa Rao, 2008, WorldScientific.

However, I will not follow this book closely. For your convenience detailed notes(in four volumes) on the whole course will be made available in the course web-page. As you will notice, the lecture notes, given the subject matter, are verydry and mechanical. We will try to make things more lively by analyzing someinteresting data (some even depicting your lives) sets and contemporary realworld problems.

Yu-Hsien Kao, TA for this course will meet with the class on Fridays, 1:30 -2:50pm, 215 DKH. Her office hours will be: 11:00am - 12:30pm Mondays.

Course Webpage: Please check Compass regularly for Announcements/ Updateson Homeworks, Exams etc.

Assessment: There will be two closed book examinations. You will also receive fourhomework assignments. The grading of the course will be based on:

Homework 20%First Exam (around mid-semester on a Th) 40%Second Exam(on the last day of the class) 40%

EpilogueIn late October of 1244 in Konya, Turkey, Shams found the student he was looking

for: Jalaluddin Rumi, already a famous Islamic scholar in Turkey. Under the tutelageof Shams, Rumi became one of the most revered poets in the world, as Rumi said,“ I was raw. I was cooked. I was burned.”

March 1248Konya, Turkey

Rumi’s son Aladdin hired a killer who did not require much convincing.It was a windy night, unusually chilly for this time of the year. A few nocturnal

animals hoofed and howled from afar. The killer was waiting. Shams of Tabriz cameout of the house holding an oil lamp in his hand and walked in the direction of thekiller and stopped only a few steps away from the bush where the killer was hiding.

“It is a lovely night, isn’t it?” Shams asked.Did he know the killer was there? Soon six others joined the killer. The seven

of them knocked Shams to the ground, and the killer pulled his dagger out of hisbelt......

Together they lifted Shams’ body which was strangely light, and dumped him intoa well. Gasping loudly for air, each of them took a step back and waited to hear thesound of Shams’ body hitting the water.

It never came.

Taken from: Elif Shafak (2010), The Forty Rules of Love, Penguin Books.

3

Page 6: Econ506NotesI-1

CONTENTS

1. Introduction

(a) Structure of the course.

2. Probability Theory

(a) Algebra of sets

(b) Random variable

(c) Distribution function of a random variable.

(d) Probability mass and density functions .

(e) Conditional probability distribution

(f) Bayes theorem and its applications . .

(g) More on conditional probability distribution .

(h) Mathematical expectation

(i) Bivariate moments

(j) Generating functions

(k) Distribution of a function of a random variable

3. Univariate Discrete and Continuous Distributions

(a) The basic distribution-hypergeometric ....

(b) Binomial distribution (as a limit of hypergeometric)

(c) Poisson distribution (as a limit of binomial)

(d) Normal distribution ...... .

(e) Properties of normal distribution

(f) Distributions derived from normal (X2 , t and F)

1

2

6

10

13

19

22

25

35

57

66

74

82

85

96

98

. 100

• 105

Page 7: Econ506NotesI-1

Introduction to Statistics for

Econometricians

by

Anil K. Bera

Page 8: Econ506NotesI-1

1.1 Introduction.

If you look around, you will notice the world is full of uncertainity. With

all the enormous amount of past information, we never can tell about the exact

weather condition of tomorrow. The same is true for many economic variable, such

as- stock price, exchange rate, inflation, unemployment, interest rate, mortgage

rate, etc. [If you know the exact future price of major stock- you can make a

million! In that case, of course, you won't be taking this course.] Then, what is

the role of statistics in this uncertain world? The basic foundation of statistics

is based on the idea that there is an underlying principle or common rule in

the midst of all the chaos and irregularities. Statistics is a science to formulate

these common rules in a systematic way. Econometric:; is that field of science

which deals with application of statistics to economics. Statistics is applicable

to all branches of science and humanities. You might have heard of fields, like­

sociometry, psychometry, cliometrics and biometrics. These are application of

statistics to sociology, psychology, history and biology, respectively. Application of

statistics in economics is somewhat controversial, since unlike physical or biological

sciences, in economics we can't conduct purely random experiments. In most

cases what we have historical data on certain economic variables. For all practical

purposes, we can view these data as result of some random experiment and then

use the statistical tools to analyze the data. For example, regarding stock price

movement, based on the available data we can try to find the underlying probability

distribution. This distribution will depend on some unknown parameters which

can be estimated using the data. We can also test some hypothesis regarding the

parameters or we can even test whether the (assumed) probability distribution is

valid or not.

Just like any other science, in statistics there are many approaches, Classical

and Bayesian, parametric and nonparametric etc. These are not always substitutes

of each other and in many cases, they can be successfully used as compliments of

each other. However) in this course, we will concentrate on the classical parametric

approach.

1

Page 9: Econ506NotesI-1

2

2.1 Basic Set Theory.

The objective of econometrics is "advancement of economic theory in its rela­

tion to statistics and mathematics." This course is concerned with the "statistics"

part, and the foundation of statistics is in probability theory. Again, "probability"

is defined for events, and these events can be described as sets or subsets. To see

the link

Econometrics ---+ Statistics ---+ Probability ---+ Event ---+ Set.

Let us start with a definition of a "set".

Definition 2.1.1: A set is any (well defined) collection of objects.

Example 2.1.1:

i) C = {I, 2, 3, ... } set of all positive integers.

ii) D = {2,4,6, ... } set of all positive even integers.

iii) F = { Students attending Econ 472}

iv) G = {Students attending Econ 402}

An object in a set is an element, e.g., 1 is an element of the set "C". We will

denote this as 1 E C, " E " means "belongs to." Note that the set C contains more

elements than the set D. We can say that D is a "subset" of C and will denote

this as DeC. Formally,

Definition 2.1.2: Set B is a subset of set A, denoted by B C A if x E B implies

x E A.

You know that with real numbers we can do lot of operations like addition

(+), substraction (-), multiplication (x) etc. Similar operations could also be done

with sets, e.g., we can "add" (sort of) two sets, substract one set from the another

etc. Two very important operations are "union" and "intersection" of sets.

Definition 2.1.3 : Union of two sets A and B is C, defined by C = AU B, if

C = {xix E A and/or x E B.}

2

Page 10: Econ506NotesI-1

3

In other words, by union of two sets we mean collection of elements which

belong to at least one of the two sets.

Example 2.1.2: In Example 2.1.1, CUD = C. If we define another set E = {l, 3,5, ...}, set of all positive odd integers, then C = DUE.

The operation can be defined for more than two sets. Suppose we have n sets

A I ,}!2,' .. , An. Then Al U A2 . " U An denoted by Ui=I Ai is defined as-n U Ai = {xix E at least one Ai, i = 1,2, ... ,n} i=I

In a similar fashion, we can define union of infinite ( but countable) number 00

of sets A 1 ,A2,A3 , ••• as U Ai = Al U A2 U A3 •.•• i=I

Example 2.1.3: Let Ai = {i}, i.e. Al {I}, A2 = {2}, ... etc. 00

Then U Ai C of Example 2.1.1. i=I

00

Or let Ai = [-i, iJ, an interval in the real line R, then U Ai n. i=I

The next concept we discuss is "intersection". Intersection of two sets A and

B, denoted by A n B, consists of all the common elements of A and B. Formally,

Definition 2.1.4 : Intersection of two sets A and B is C, denoted by C An B,

if C = {xix E A and x E B}.

Example 2.1.4 : In Example 2.1.1, enD D and F n G={students attending

both Econ 472 and 402}

As in the case of "union", we can also define the operation "n" for more than two

sets. For example, nnAi = Al n Az ... n An {xix E Ai, for all i = 1,2"", n}

i=l

00

1 2 3 n Ai = Al n A2 n A 3 ••• = {xix E Ai, for all t , , , ... } i=I

It is easy to represent the above two concepts diagramatically called Venn

diagram [see Figure 2.1.1]

3

Page 11: Econ506NotesI-1

4

A

AOB

Figure 2.1.1

B

Be (l... Figure 2.1.2

- 3/:·­

Page 12: Econ506NotesI-1

5

Continuing with Example 2.1.1, suppose those students taking Econ 472 have

already attended Econ 402, i.e. there is no student in Econ 472 class who is taking

Econ 402 now. Then if we talk about F n G, the set will be empty. We will call

such a set a null set and will denote it by 4>. By definition for any set A, AU4> = A,

An 4> = 4>.

Example 2.1.5: If! Example 2.1.1 and 2.1.2, D n E = 4>.

Earlier we noted, in Example 2.1.1, DeC. Now remove the elements of D

from the set C, what we are left with is the set E in Example 2.1.2. We write this

as E = C - D, i.e., the difference between sets is the elements of one set after

removing the elements those belong to the other set. Formally,

Definition 2.1.5: The difference between two sets A and B, denoted by A - B,

is defined as C = A - B = {xix E A and x r:f. B}. Note, " r:f. " means "does not

belong to". In Venn diagram A - B can be represented as in Figure 2.1.2.

Now it is clear that a set consists of elements satisfying certain properties.

We can imagine a big set which consists of elements with very little restriction.

For example, in Example 2.1.1, regarding sets C and D, we can think of n, set

of all real numbers. We will vaugely called such a big (reference) set, a .space

and will denote as S. l\-ote here, C c S, DeS. So let S = n. Define

Q = {set of all rational numbers}, then S - Q = {set of all irrational numbers}.

Another way to think about S - Q is as "compliment" of Q in S, which denoted as QCIS.

Definition 2.1.6. Compliment of a set A with respect to a Space S, denoted by

ACIS= {x E Six r:f. A}.

In most cases, the reference set S will be obvious from the context and we

will omit S from the notation and will write AcIS as simply A c.

Example 2.1.6: In Examples 2.1.1 and 2.1.2 DC IC = E, EC IC D.

See the Venn diagrams in Figure 2.1.3.:

Consider the identity (A UBr = A C n BC. Without the diagram, we can

easily prove this. The trick is if we want to show that a set C is equal to another

set D, show the following:

4

Page 13: Econ506NotesI-1

6

Figure 2.1.3

- yA ­

Page 14: Econ506NotesI-1

7

If for every x, x E C then xED :::} C C D

If for every x,x E D then x E C:::} Dee

combine these two and obtain C D.

Let us prove the above identity. Let x E (A UBt so x ¢:. (A. U B), i.e.,

x ¢:. A and x ¢:. B. In other words x E AC and x E B C i.e., x E AC nBc.

Therefore, (A. U B) C C AC n BC. N~xt assume x E AC n BC, and reversing the

above arguments, we see that x E (A U Bt. So we have AC n B C c (A U Bt. Hence (A U Bt = AC nBc.

Now try to prove the following identity

These identities are known as De Morgan's law. Try to prove the following gener­

alizations:

Let us now link up the set theory with the concepts of "event" and "probability".

Suppose we throw one coin twice. The coin has two sides, head (H) and tail (T).

What are the possible outcomes?

Both tail (T T)

Both head (H H)

Tail head (T H)

Head tail (H T).

Collect these together in a set D={ (T T), (H H), (T H), (H Tn, this is the

collection of all possible outcomes. We may be interested in the following special

outcomes:

Al {outcomes with first head} = {(HH),(HTn.

A2 {outcomes with at least one head} = {(HH),(HT),(T,H)}.

A3 {outcomes with no tail} = {(H H)}.

5

Page 15: Econ506NotesI-1

8

AI, A 2 , As ... are all events, and note AI ,A21 As C n. We can think

of a collection of subsets of n and a particular event will be an element of that

collection. Under this framework we can define the probabilities of different events.

So far we have considered sets which are collection of single element, e.g., we

had a set C = {I, 2, 3, ... }. We can think uf <t set whose elements are also sets,

i.e., a set of sets. We can call this a collection of a class of sets. By giving a

different structure to this class of sets, we can define many concepts, such as ring

and field. For our future purpose, all we need is the concept of a - field (sigma

field). This will be denoted by A (script A). A 0" - field is nothing but a collection

of sets AI, A 2 , As, ... satisfying the following properties

(i) AI,Az, ... E A ===?

00

U Ai EA. i=l

(ii) If A E A then AC E A.

In other words A is closed under the formation of countable unions and under

complimentation. From the above two conditions, it is clear that for A to be a

O"-field,the null set and the space n must belong to A.

Example 2.1.7: n {1,2,3,4}, A a-field on n c~n be written as

A = {<p, (1, 2), (3, 4),(1, 2, 3, 4)}

Example 2.1.8:

n = R (real line)

A = {countable union of intervals like( a, b]}

A is called Borel field and members of A are called Borel sets in R.

2.2 Random Variable.

As you can guess, the word "random" is associated with some sort of uncer­

tainty. If we toss a coin, we know the possibilities: head (H) or tail (T); but we are

uncertain about exactly which one will appear. Therefore, "tossing a coin" can be

regarded as a random experiment where the possibilities are known but not the

6

Page 16: Econ506NotesI-1

9

exact outcome. The probability theory, the collection of all possible outcomes is

known as Jample Jpace.

Example 2.2.1:

(i) Toss a coin. The sample space is 51 = {H, T}.

(ii) Toss two coins or one coin twice, the sample space

51 = {(HH),(TT),(HT), (TH)}

(iii) Throw a die,

51 {(.), (:), (: .), (::), (:: .), (:::)}

Instead of assigning symbols, we can give these outcomes, some numbers (real

numbers). For example, for the above Example (i), we can define

x 0 if the outcome is T

= 1 if the outcome is H

For Example (iii) above, X can take values 1, 2, 3, 4, 5, 6. X defined in such a

way is called a random variable. Once a random variable is defined, we can talk

about the probability distribution of the random variable.

Let us first formally define "Probability". For Example (i), we have the sample

space 51 {H, T}. The O"-field defined on 51 is A = {cP, 51, (H), (T)}. Elements

of A are called the events. "Probability" is nothing but assigning real numbers

(satisfying some conditions) to each of these events.

Definition 2.2.1: Probability denoted as P is a function from A to [0,1].

P : A ---t [O,lJ

satisfying the following axioms

(i) pen) = 1

(ii) If AI, A 2 , A 3 , . •• E A are disjoint (i.e., Ai n Aj = cP for all i f:: j) then

00

7

Page 17: Econ506NotesI-1

10

Example 2.2.2:

n == {H,T}

A = {t,h, n, (H), (T)}

P(t,h) = 0, pen) 1, P(H) = f' peT) = t Earlier we indicated that a random variable can be-defined by assigning real

number to the elements of n. Now define a a-field on the real line n and denote

it by B. Formally, we can define a random variable X as

Definition 2.2.2: A random variable X is a function X : n --+ R such that for

all B E B, X-I (B) EA.

Note that here X I(B) = {w E nIX(w) E B). For a diagramatic representa­

tion of random variable X as a function, see Figure 2.2.1

In other words X (.) is a measurable function from the sample space to the

real line. "Measurability" is defined by requiring that the inverse image of X is

an element of the a-field, i.e., an event. Recall that, probability is defined only

for events. By requiring that X is measurable, in a sense, we are assuring its

probability distribution.

Example 2.2.3: Toss a coin twice, then the sample space n and a a-field A can

be defined as

n = {(H H), (TT), (HT), (TH)}

A {t,h, n, (HH), (TT), (HT), (TH), ((HH)(TT)),

((TT)(HT)), ((HT)(TH)), ((HH)(TH)), ((HH)(HT))

((TT)(TH)), ((HH)(TT)(HT)), ((TT)(HT)(TH)),

((HH)(TT)(TH)),((HH)(HT)(TH))}

Define X = number of heads. Then X takes 3 values

x=o =1

2

8

Page 18: Econ506NotesI-1

11

]

X-I

Figure 2.2.1

-- ~l-

Page 19: Econ506NotesI-1

12

First assign the following probabilities

1 1 1 1 P(HH) = 4' P(TT) P(HT) = 4' P(TH) = -.4' 4

The triplet (n, A, P) is called a probability space and peA) is the probability

of the event A.

CorreSponding to (n, A, P), there exists another probability space (R, B, pX),

where pX is defined as

PX(B) = P[X- 1(B)] for B E B.

In the above example, take B = 1, then

PX(l) = P[X-1 (1)]

= P[(HT), (TH)]

P[(HT) U (TH)]

P(HT) +P(TH) (why?) 1 1 1

= 4 + 4; = 2' Similarly, we can show that

1pX (0) = 1 and pX (2) 4 4

pX (.) is called the probability measure induced by X. To summarize, we have

defined two functions

X : n ------+ R

pX : B ------+ [0, 1].

where B is a a-field defined on R [see Example 2.1.8]

For the above example, the two functions can be described as

w X(w) pX

(TT) (HT),(TH) (HH)

0

1 2

1/4 1/2 1/4

9

Page 20: Econ506NotesI-1

13

The last two columns describe the probability distribution of the random

variable X. Sometimes we will simply denote it by P(X).

x P(X) 0 1/4 1 1/2 2 1/4

Most of the time probability distributions (of discrete random variables) are

presented this way. From the above discussion, it is clear that each such probability

distribution originates from an n, sample space of a random experiment.

Definition 2.2.3: Listing of the values along with the corresponding probabilities

is called the probability distribution of a random variable.

Note: Strictly speaking, this definition applies to "discrete" random variable only.

Later, we will define, "discrete" and"continuous" random variables.

2.3 Distribution Function of a Random Variable.

Sometimes it is also called cumulative probability distribution and is denoted

by F(·). Let us denote by "x" the value( s) X can take, then F(·) is simply defined

as

F( x) Probabili ty of the event X:::; x

Pr(X :::; x).

Note: \Ve will use "PrO" to denote probability of an event without defining the

set explicitly, and PO or pX (.), when the set is explicitly stated in the argument.

Also note that the probability spaces for P and pX are respectively, (n, A, P) and

(n,B,pX ).

Let us now provide a formal definition of the distribution function. Let

W(x) = {w E nIX(w) :::; x}.

Since X is measurable, W(x) E A. In the probability space (n,B,pX ), we can

write the probability of W(x) as

P(W(x)) = pX[(_oo, xl].

10

Page 21: Econ506NotesI-1

14

This is well defined since (-00, xl E B. This probability is called the distribution

function of X, i.e.,

F(x) = Pr(X ~ x) P (W(x» = pX [( -00, xl].

For our example:

w px = Pr(X = x) F(x) = Pr(X ~ x)

(TT) o 1/4 1/4

(H T) (T H) 1 1/2 1/2 + 1/4 = 3/4

(HH) 2 1/4 3/4 + 1/4 = 1

Or simply

X F(X) 0 1 2

1/4 3/4

1

If we plot, F(x) will look like as in Figure 2.3.1. Note that it is a step function.

Also notice, the discontinuties at x = 0, 1 and 2.

2.3.1 Properties of the Distribution Function.

(i) 0 ~ F(x) ~ 1. Since F(x) is nothing but a probability, the result follows from

the definition of probability.

(ii) F(x) is a nondecreasing function of x Le., if Xl > Xz, then F(xt) 2: F(xz).

Proof:

F(Xl) = Pr(X ~ xd pX [( -00, Xl]] pX(Ad (say)

F(xz) = Pr(X ~ xz) = pX [( -00, xzn pX(Az) (say)

Since Az C AI, we have

PX(Ad 2: PX(A2) (why 7)

I.e., F(XI) 2: F(xz)

11

Page 22: Econ506NotesI-1

15

Fex-)

F. 0 i19ure 2.3.1

1/ A ­

Page 23: Econ506NotesI-1

16

(iii) F(-oo) = 0 where F(-oo) = limn--+ooF(-n).

Proof: Define the event

An = {w E DIX(w) ::; -n}

Note that P(An) = Pr(X ::; -n) = pX [( -00, -nl] = F( -n).

Now limn--+oo An = ¢>

F(-oo) = lim F(-n) = lim P(An)n--+oo n--+oo

= P( lim An)' (why 7) n--+oo

= P(¢» = O. (why 7)

Note: The first (why 7) follows from the "continuity" property of P(.). It says:

if {An} is a monotone sequence of events, then P(limn--+oo An) = limn--+oo P(An).

(Try to prove this; see Workout Examples-I, Question 6).

(iv) F(oo) = 1, where F(oo) = limn--+ooF(n).

The proof is similar to (iii), Define

An = {w E DIX(w) ::; n}

F( 00) = lim P(An) = P( lim An) = P(D) = l.n--+oo n--+oo

(v) For all x, F( x) is continuous to the right or right continuous. [What does it

really mean is that F(x + 0) = F(x) where F(x + 0) = lime.j..o F(x + c:).]

Proof: Define the set

1 An = {w E DIX (w) ::; X + -)

n

1 F(x + -) = P(An)

n

lim F(x + ~) = lim pX [( -00, xl] =F(x).n--+oo n n--+oo

12

Page 24: Econ506NotesI-1

17

1F(x +0) = limF(x +c) = lim F(x + -).

e:.j.O n-too n

Therefore, F(x + 0) = F(x).

We can show that F(x) may not be continuous to the left. i.e., F(x-O) ¥ F(x) where F(x 0) lime:to F(x + c:). To prove this, define

1 Bn = {w E nIX(w) :::; x - -}

n

F(x - 0) = lim F(x - ~) lim P(Bn) n-too n n-too

P( lim Bn) = pew E QIX(w) < x) = Pr(X < x).n-too

However,

F(x) Pr(X:::; x) = Pr(X < x) +P(X = x) (why?)

Hence,

F(x) - F(x - 0) Pr(X = x)

Therefore, whenever Pr(X x) > 0, there will be a jump in F(x) at X = x, or

discontinuity at X x. In the Figure 2.3.1, we noted the discontinuity at x = 0,1,

and 2. Also note that

Pr(X = 0) = ~ > 0

Pr(X = 1) = ! > 0

Pre X = 2) = ~ > 0

If Pr(X = x) = 0 for all x then F(x) will be continuous since, in that case

F(x) F(x + 0) = F(x - 0) for all x.

2.4 Probability Mass and Density Functions.

Once we have defined the distribution function, we can talk about the "Prob­

ability mass function" (for discrete variables) and "Probability density function"

(for continuous variables).

13

Page 25: Econ506NotesI-1

18

Let n contain finite (or count ably infinite) number of elements. Here by

countably infinite we mean one-to-one correspondence with the set of integers, N =

{I, 2, 3, ..... }. To see an example, consider an experiment of tossing a coin until we

get a head. Then n = {H, T H, TTH, ..... }. If we define X as the number of trials

to get a head then X = 1,2,3, ...... Denote that as n = {WI, W2, W3, .•• }. Therefore,

n contains discrete points. For any event A E A, we define the probability

peA) L: P(Wi). "',EA

For a random variable X 1 constructed on n will also take discrete values. Let us

now denote the range of X as X and the associated probability space as (X, B, pX). Therefore, we will have a discrete random variable X with discrete probability

distribution pX. Given that

pX(X) 1.

the total mass will be distributed on a discrete number of points. Therefore,

sometimes the probability distribution of X, pX is called probability mass func­

tion(pmf).

Example 2.4.1:

n = {(HH), (TT), (HT), (TH)}

X = # heads

Then

1.e.,

X p,x

o 1/4 1 1/2 2 1/4

14

Page 26: Econ506NotesI-1

19

pX(X) = L3

Pr(X = Xi) i=l

Example 2.4.2 :

(i) Toss a coin n times and let X = # heads. Then X takes (n+ 1) values, namely,

X = 0,1,2, ... , n. The probability distribution of X with the corresponding

points in the sample space can be written as

pXw X

TTTT ... TTT o (1/2t

HTTT ... TTT

2

2

2

(1/2)n ) THTT ... TTT (1/2)n ( Add = n(1/2)n

TTTT ... TTH (1/2)nj

HHTT .. , TTT

THHT ... TTT

TTTT ... THH

THHH ... HHH (n -1) HTHH ... HHH (n - 1)

Add = n(1/2)n HHHH ... HHT (n

(1/2)n (1/2)n

U/2)n (1/2)n

HHHH ... HHH n (1/2)n

So here Pr(X 1) = n O)n ,Pr(X = 2) = ot and so on. Later we will derive this probability distribution simply as a special case of binomial

15

Page 27: Econ506NotesI-1

20

distribution. Check here that if we add pX for all the values of X, it is equal

to one.

(ii) Let us now consider our earlier example of tossing a coin until we get a head,

and define X = # heads. Then X will take countably infinite number of

values with the following probability distribution.

pXw x H 1 1/2 TH 2 (1/2)2

TTH 3 (1/2)3

It is easy to check that here the total probability is equal to t+ (t) 2 + ( t ) 3 + ... = 1.

(iii) Now suppose X takes n values, (Xl,X2, ... ,Xn ) = {Xi, i = 1,2, ... ,n}.

Let Pr(X=xi)=Pi, 2 1,2, ... ,n

The distribution function for this probabiity mass function is

F(x) = Pr(X $ x) = I: Pi· Xi

Any set of pi s can serve our purpose. All we need is to satisfy the following two

conditions:

(i) Pi 2: 0 Vi.

(ii) 2:i Pi = 1.

As we noted before when the distribution is discrete there will be jumps in F(x),

and therefore, it will not be continuous and hence not differentiable. Now suppose,

F(x) is continuous and, differentiable excpept a few points and

f(x) = dF(x) dx

16

Page 28: Econ506NotesI-1

21

where f( x) is continuous function (except at a few points). We will then call X

a continuous random variable with probability density function (p. d. f.) f(x).

Therefore, the relation between f(x) and F(x) can also be written as

F(x) = [Zoo f(t)dt.

Recall F(00 ) 1, therefore

[: f(x)dx = 1.

Also we noted earlier that F(x) is nondecreasing, therefore we should have f( x) :2: o "Ix. We define f(x) to be a pdf of a continuous random variable X if the

following two conditions are satisfied

(i) f(x):2: 0 "Ix E X

(ii) [: f(x)dx Ix f(x)dx 1.

Note: Here X denotes the range of X.

For a continuous variable X,

Pr(a::; X :::; b) = Pr[X :::; b]- Pr[X ::; a]

= F(b) - F(a)

= [boo f(x)dx - [a f(x)dxoo = lb f(x)dx.

Note that for the discrete case, this probability can be written as

Pr(a :::; X :::; b) = 'I: Pr(X Xi) a~Xi~b

When F is continuous Pr(X = a) = F(a) - F(a-) = O. Therefore, for

continuous case, Pr(a ::; X :::; b) = Pr(a < X :::; b) Pr(a:::; X < b) = Pr(a < X < b). [see Figure 2.4.1]

17

Page 29: Econ506NotesI-1

22

Figure 2.4.1

17A ­

Page 30: Econ506NotesI-1

23

Example 2.4.3: 0, for x < °

Let F(x) = x, for x E [O,IJ{

1, for x > 1.

as given in Figure 2.4.2.

Here F(x) is ".differentiable," therefore we can construct f( x) as

0, for x < ° f(x)::::; 1, for x E [0, IJ

{ 0, for x > 1.

Simply we can write this as [see Figure 2.4.3]

f(x) = {I, for x E [0,1] 0, elsewhere.

Here X is a continuous random variable; however, note the discontinuities of f( x )

at 0 and 1. This distribution is known as uniform distribution [since for x E [0, 1], f(x) is constant].

So far we have talked variables which are either discrete or continuous. A

random variable, however, could be of mixed type. Let

X = expenditure on cars

If we assume X is continuous, then Pr(X = 0) = O. But there will be many

individuals who do not have any expenditure on cars. Suppose half of the people

do not have any expenditure on cars,during a certain period then it is reasonable

to put Pr(X = 0) = 0.5. Suppose we assume F(x) = 0.5 + 0.5(1 - e- X) for x > 0,

and F(x) = 0 for x < 0. The corresponding probability function is [see Figure

2.4.4J

Pr(X < 0) = 0

Pr(X = 0) 0.5

f(x) = 0.5e-x for x > 0

18

Page 31: Econ506NotesI-1

24

Figure 2.4.2

,.. ' () i

F19ure 2.4.3

16 /1 -­

Page 32: Econ506NotesI-1

25

j~)

'x... -> Figure 2.4.4

Page 33: Econ506NotesI-1

26

Note that here f(x) ;:::: 0 and

i: f(x)dx 0.5 +0.51= e-Xdx = 1.0.

Hence, this is a well defined probability distribution.

2.5 Conditional Probability Distribution.

Let us consider two events A, B E A. We are interested in evaluating proba­

bility of A only for those cases when B also occurs. We will denote this probability

as P(AIB) and will assume PCB) > O. We can treat B as the (total) sample space.

First note that

P(AIB) peA n BIB).

This is true because when fl is the sample space

peA) = P(Alfl) peA n fllfl).

Here B is our sample space. Also note that P(BIB) = 1. Now,

peA IB) = peA n BIB) = peA n Blfl)P(AIB) (why?)n B P(BIB) P(Blfl)

p(An B) PCB) .

We will write this conditional probability simply as P(AIB) P~tm)' This is called conditional probability of (event) A given (event) B.

Note: (Above why?) Use old definition of probability

#cases for An BpeA n BIB)

#cases for B (#cases for An B/#cases in fl) peA n Blfl)

-(#cases forB/#cases in fl) P(Blfl)

Example 2.5.1 : Let

fl {(HH),(TT),(HT),(TH)}

and A (HT), B = {(HT), (TH)}, An B = (HT).

19

Page 34: Econ506NotesI-1

27

1 1 1 Therefore, P(A) = 4' P(B) = 2' p(AnB)=4

Let us first intiutively find the conditional probabilities. For (AlB), we know

that either (HT) or (TH) has appeared, and we want to find the probability that

(HT) has occurred. Since all the element of n has equal probability, P(AIB) = ~.

Similarly P(BIA) = 1 since (HT) has already occurred. Now let us use the formula

to get the conditional probabilities.

P(AIB) = P(A n B) = t = ~ # P(A)P(B) t 2

P(A n B) 1 P(BIA) = P(A) = t = 1 # P(B) (Interpret this result)

Here the probability of A (or B) changes after it has been given that B (or A) has

appeared. In such a case we say that the two events A and B are dependent.

Example 2.5.2: Let us continue with the same sample space

n = {(HH), (TT), (HT), (TH)}

but now assume A = {(TT),(HT)}, . B = {(HT),(TH)}

We have An B = {(HT)}

Therefore, P(A) =~, P(B) = ~, P(A n B) = ~ 4

p(AnB) 1 1 P(AIB) = P(B) = t = "2 = P(A) (Interpret this result)

P(BIA) = p(AnB) = i = ~ = P(B) P(A) ~ 2

Therefore, we have P(AIB)= P~t:/ =P(A)

1.e., P(AB)=P(A).P(B).

In this case, we say that A and B are independent.

Result 2.5.1: Conditional probability satisfies the axioms of probability.

20

Page 35: Econ506NotesI-1

28

Proof:

C) P(AIB) = P(AB) > 0 Z PCB) ­

(;;) p(nIB) pen n B) PCB) 1 .. ~, = PCB) = PCB) = .

(iii) Let AI) A21 A 3 ) ••• be a sequence of disjoint ~vents, then

P (QI (Ai n B)) -

PCB)

_ I::~l P(Ai n B)

PCB)

= f P(Ai nB)

i=1 PCB) <Xl

= LP(AiIB) i=l

Note that (Ai n BYs are disjoint, since (Ai n B) n (Aj n B)=Ai n Aj n B = ¢ for

ii=j.

Note: Conditional distributions are vary useful in many practical applications.

Such as,

(i) Forecasting: Give data on T periods, XI, X 2 , .•. ,XT if we want to forecast

the value in (T + 1)th period, that could be obtained from the conditional

disribution P(XT+IIXI , X 2 ) • •• ,XT).

(ii) Duration dependence: We can consider the conditional probability of a getting

a job given the duration of unemployment.

(iii) Wage differential: Wage distributions could be different for unionized and

non-unionized workers.

21