Randomness and Probability Part II

Introduction to Statistics : Randomness & Probability

Introduction to StatisticsRandomness and Probability

Part II

Instructor : Siana Halim

-S. Halim -

-S. Halim -

TOPICS

• Understanding Randomness• From Randomness to

Probability• Probability Rules !• Random Variables• Probability Models• Normal Distribution

References:•De Veaux, Velleman , Bock, Stats, Data and Models, Pearson Addison WesleyInternational Edition, 2005•John A Rice, Mathematical Statistics and Data Analysis, Duxbury Press, 1995


-S. Halim -


-S. Halim -


An insurance company offers a “death and disability” policy that pays $10.000 when you die $ 5.000 if you are permanently disabled.

It charge a premium of only $50 a year for this benefit. Is the company likely to make a profit selling such a plan ?

To answer this question, the company needs to know the probability that its clients will die or be disabled in any year.

4. Random Variables

-S. Halim -


Expected Value : Center

The amount the company pays out on an individual policy is called a random variable because its value is based on the outcome of a random event.

Let X be a random variable, and x is the realization of X

For the insurance company, x can be $10.000 (if you die that year), $5000 (if you are disabled), or 0$ (if neither occurs).

Because we can list all the outcomes, we might formally call this random variable a discrete random variable. Otherwise, we’d call it a continuous random variable. The collection of all possible values and the probabilities that they occur is called the probability model for the random variable.

-S. Halim -


997/10000Neither

2/10005000Disability

1/100010.000Death

ProbabilityP(X = x)

PayoutX

Policyholderoutcome

We can’t predict what will happen during any given year, but we can say what we expect to happen.

The expected value of a policy is a parameter of this model. In fact, it’s the mean. We’ll signify this with the notation μ (for population mean) or E(X) for expected value.

∑== )()( xpxxEμ

For this case :μ = E(X)

= $ 10.000 (1/1000) + $5000 (2/1000) + $0 (997/1000)

= $20

The expected value of a (discrete) random variable is :

-S. Halim -


First Center, Now Spread…

For data, we calculated the standard deviation, by first computing the deviation from the mean and squaring it. We do that with (discrete) random variables as well. Fist, we find the deviation of each payout from the mean (expected value) :

997/10002/1000

1/1000

ProbabilityP(X = x)

(0-20) = -20(5000-20)=4980

(10.000-20)= 9980

Deviation(x-μ)

0Neither5000Disability

10.000Death

PayoutX

Policyholder outcome

Next we square each deviation. The variance is the expected value of those squared deviations.

Var (X) = 99802 (1/1000) + 49802 (2/1000) +(-20)2 (997/1000) = 149.600

SD (X) = 78,386$600,149 ≈

-S. Halim -


)()(

)(.)()( 22

XVarXSD

xXPxXVar

==

=−== ∑σ

μσ

The variance and the standard deviation of a (discrete) random variable are

More About Means and Variances

Var (aX) = a2 Var(X)E (aX) = a E(X)

Var (X ± c) = Var(X)E(X ± c) = E(X) ± c

-S. Halim -


In general,

• The mean of the sum of two random variables is the sum of the means.

• The mean of the difference of two random variables is the difference of the means

• If the random variables are independent, the variances of their sum or difference is always the sum of the variances.

Var (X ± Y) = Var(X) + Var(Y)E(X ± Y) = E(X) ± E(Y)

Beware ! For random variables

X + X + X ≠ 3X

-S. Halim -


What Can Go Wrong ?• Probability models are still just models. Models can be useful, but they are not reality.

• If the model is wrong, so is everything else. Before you try to find the mean or standard deviation of a random variable, check to make sure the probability model is reasonable.

• Watch out for variables that aren’t independent. You can add expected values of any two random variables, but you can only add variances of independent random variables.

•Variances of independent random variables add. Standard deviations don’t.

•Variances of independent random variables add, even when you’re looking at the difference between them

• Don’t write independent instances of a random variables with notation that looks like they are the same. Write X1 + X2 + X3 rather than X + X + X

-S. Halim -


5. Probability Models

Generally, the probability measure on the sample space determines the probabilities of the various values of X; if those values are denoted by then there is a function p such that

and

This function is called the probability mass function, or the freqeuncy function of the random variable X.

,..., 21 xx

)()( xXPxp i ==

∑ =i

ixp 1)(

Probability Mass Function

-S. Halim -


You’ve got to have the Tiger Woods picture, so you start madly opening boxes of cereal, hoping to find one. Assuming that the pictures are randomly distributed, there’s a 20% chance you succeed on any box you open. We call the act of opening a box a “trial”, and note that:

• There are only two possible outcomes (called success and failure) on each trial. Either you get Tiger’s picture or you don’t

• The probability of success denoted p, is the same on every trial. Here p = 0.2

•The trials are independent. Finding Tiger in the first box does not change what might happen when you reach for the next box.

Situations like this are called Bernoulli trials.

Searching for Tiger

-S. Halim -


Daniel Bernoulli (1700-1782) discovere of Bernoulli trials.

A Bernoulli random variables takes on only two values; 1 and 0 with probabilities p and q = 1-p, respectively

Its probability mass function is thus

p(1) = p

p(0) = 1-p

100)( ≠≠= xandxifxp ⎩⎨⎧ ==−

=−

otherwisexorxifpp

xpxx

010)1(

)(1

-S. Halim -


• What‘s the probability that you will find the Tiger‘s picture in the first box of cereal ? It‘s 20%. We could write P(# boxes = 1) = 0.20.

• How about the probability that you don‘t find Tiger until the second box ? Well, that means you fail on the first trial and then succeed on the second. With the probability of success 20%, the probability of failure will be q = 1-0.2 = 80%. Since the trials are independent, the probability of getting your first success on the second trial is P(#boxes = 2)= (0.8)(0.2) = 0.16.

• What are the chances that you won‘t find Tiger until the fifth box of cereal ? You‘d have to fail 4 straight times and then succeed, so P(#boxes = 5) = (0.8)4(0.2) = 0.08192

• How many boxes might you expect to have to open ? We could reason that since Tiger‘s picture is in 20% of the boxes, or 1 in 5, we expect to find his picture on average, in the fifth box; that is, μ = 1/0.2 = 5 boxes.

-S. Halim -


The Geometric ModelWe are more likely to want to know how long it will take us to achieve a success. The model that tells us this probability is called the Geometric probability model.

Geometric probability model for Bernoulli trials : Geom (p)

p = probability of success(and q = 1-p = probability of failure)

X = number of trials until the first success occursP(X = x) = qx-1 p

Expected value : Standard deviation :

p1

=μ

2pq

=σ

The 10% condition

Bernoulli trials must be independent. If that assumption is violated, it is still okay to proceed as long as the sample is smaller than 10% of the population.

Probability mass function of geometric distribution random variable with p = 1/9

-S. Halim -


The Binomial ModelSame situation, different question. You buy 5 boxes of cereal. What‘s the probability you get exactly 2 pictures of Tiger Woods ?

We are still talking about Bernoulli trials, BUT with a different question.

We want to find the probability of getting 2 successes among the 5 trials.

We asked how long it would take until our first success

NowBefore

This time we‘re interested in the number of success in the 5 trials. We want to find P(# successes = 2). This is an example of a Binomial probability.

It takes two parameters to define the Binomial model;

the number of trials, n, and the probability of success, p.

We denote this Binom (n,p).

-S. Halim -


Binomial probability model for

Bernoulli trials : Binom (n,p)

n = number of trials

p = probability of success

(and q = 1-p = probability of failure)

X = number of trials until the first success occurs

Expected value :

Standard deviation :

)!(!!,)(

xnxn

xn

whereqpxn

xXP xnx

−=⎟⎟

⎠

⎞⎜⎜⎝

⎛⎟⎟⎠

⎞⎜⎜⎝

⎛== −

np=μnpq=σ

n = 10, p=0.1

n=10,p =0.5

-S. Halim -


The Poisson Model

When rare events occur together or in clusters, people often want to know if that happend just by chance, or whether something else is going on. If we assume that the events occur independently, we can use a Binomial model to find the probability that a cluster of events like this occurs. For rare events, p will be quiet small, and when n is large it may be difficult to compute the exact probability that a certain size cluster occurs.

Simeon Denis Poisson was a French mathematician interested in events with very small probability. He originally derived his model to approximate the Binomial model when the probability of success, p, is very small and the number of trials, n , is very large.

-S. Halim -


Poisson probability model for succes : Poisson (λ)

λ = mean number of successes.

X = number of successes

Expected value : E(X) = λ

Standard deviation : SD(X) =

!)(

xexXP

xλλ−==

λ

One of the consequences of the Poisson model is that , as long as the mean rate of occurrences stays constant, the occurence of past events doesn‘t change the probability of future events.

Poisson probability mass function 10)(,5)(,1)(,1.)( ==== λλλλ dcba

-S. Halim -


For a continuous random variable, the role of the probability mass function is taken by a density function, f(x), which has the properties that:

•

• f is piecewise continuous function

•

If X is a random variable with a density function f, then for any a < b the probability that X falls in the interval (a,b) is the area under the density function between a and b:

0)( ≥xf

∫∞

∞−

= 1)( dxxf

∫=<<b

a

dxxfbXaP )()(

Continuous Random Variables

-S. Halim -

A uniform random variable on the interval [0,1] is a model for what we mean , when we say “choose a number at random between 0 and “.

⎩⎨⎧

><≤≤

=10,010,1

)(xorxx

xf

⎩⎨⎧

><≤≤−

=boraxbxaab

xf,0

),/(1)(

The uniform density function on the interval [0,1] is defined as follow:

The uniform density on a general interval [a,b] is

Uniform Random Variables


-S. Halim -

∫ ===a

a

dxxfaXP 0)()( )()()( bXaPbXaPbXaP ≤<=<≤=<<

The cdf of continuous random variable X is defined in the same way as for a discrete random variable:

The cdf can be used to evaluate the probability that X falls in an interval:∫∞−

=≤=x

duufxXPxf )()()(

)()()( aFbFbXaP −=≤≤

Properties of Continuous Random Variables


-S. Halim -

The exponential density function is

The cumulative distribution function is⎩⎨⎧

<≥

=−

0,00,

)(xxe

xfxλλ

∫∞−

−

⎩⎨⎧

<≥−

==x x

xxe

duufxF0,00,1

)()(λ

The exponential distribution is often used to model lifetimes or waiting times, in which context it is conventional to replace x by t.

Exponential densities with (solid), (dotted), and (dashed)

5.=λ1=λ 2=λ


The Exponential Density

-S. Halim -

Suppose that we consider modeling the lifetime of an electronic component as an exponential random variable, that the component has lasted a length of time s, and that we wish to calculate the probability that it will last at least t more time units, that is, we wish to find P(T>t + s | T>s) :

t

s

st

ee

esTPstTP

sTPsTandtTPsTstTP

λ

λ

λ

−

−

+−

=

=

>+>

=

>>>

=>+>

)(

)()(

)()()|(

We see that the probability that the unit will last t more time units does not depend on s. The exponential distribution is consequently said to be memory less.

But ! It is clearly not a good model for human lifetimes, since the probability that a 16-year-old will live at least 10 more years is not the same as the probability that an 80-year-old will live at least 10 more years.


-S. Halim -


6. Normal Probability Models

The density function of the normal distribution depends on two parameters, μ and σ , where

-∞< μ < ∞; σ > 0

∞<<−∞= −− xexf x ,2

1)(22 2/)( σμ

πσ

The parameters μ and σ are called the mean and standard deviation of the normal density.

The special case for which μ=0 and σ=1 are called the standard normal density

-S. Halim -


-S. Halim -

The standard normal table gives the area to the left of a spesified z:

= Area under curve to the left of z

For the probability of an interval [a,b]

= (Area to the left of b) – (Area to the left of a)

The following properties can be derived from the symmetry of the density about 0 :

(a) = 0.5

(b)

(c) If z > 0:

)( zZP ≤

)( bZaP ≤≤

)0( ≤ZP

)(1)( zZPzZP ≤−=−≤)0(5.0)( zZPzZP ≤<+=≤

)0(5.0)( zZPzZP ≤<−=−≤

Property (c) is needed for using other normal tables that give only that probability )0( zZP ≤<


Use of Normal Table

-S. Halim -

If X is then is N(0,1). So

Where the probabilites for Z are obtained from the standard normal table

),( σμNσμ−

=XZ

⎟⎠⎞

⎜⎝⎛ −

≤=⎟⎠⎞

⎜⎝⎛ −

≤−

=≤σμ

σμ

σμ bZPbXPbXP )(

⎟⎠⎞

⎜⎝⎛ −

≤≤−

=

⎟⎠⎞

⎜⎝⎛ −

≤−

≤−

=≤≤

σμ

σμ

σμ

σμ

σμ

bZaP

bXaPbXaP )(

Property 1 of Normal Distribution


-S. Halim -


If X is , then

Y = a+bX is

Multiplying by a constant b and adding a constant a only changesthe mean and the variance of the normal distribution

),( σμN),( σμ bbaN +


The sum of two independent normals is normal. If X is and Y is , then for independent X and Y

X+Y is where

),( 11 σμN),( 22 σμN

),( σμN22

21

22

21

221

σσσ

σσσ

μμμ

+=

+=

+=


-S. Halim -

The Normal Approximation To the Binomial Distribution

If X has the binomial b(n,p), where n is large and p is not too near 0 or 1, the distribution of the standardized variable is approximately N(0,1)

(without continuity correction)

(using continuity correction)

npqnpXZ )( −=

⎟⎟⎠

⎞⎜⎜⎝

⎛

−−

≤≤−

−≈≤≤

)1()1()(

pnpnpbZ

pnpnpaPbXaP

⎟⎟⎠

⎞⎜⎜⎝

⎛

−−+

≤≤−−−

≈)1(

5.0)1(

5.0pnpnpbZ

pnpnpaP


-S. Halim -


Suppose the Surabaya Red Cross anticipates the need for at least 1850 units of O-negative blood this year. It estimates that it will collect blood from 32.000 donors. How great is the risk that the “Surabaya” Red Cross will fall short of meeting its need ?

The Normal Model to the Rescue !

We can use the Binomial model with n=32000 and p = 0.06. But to calculate the probability of getting exactly 1850 units of O-negative blood from 32.000 donors is tedious (or outright impossible).

Instead, we should use the Normal model.

-S. Halim -


The Binomial model has mean np = 1920 and standard deviation . We could try approximating its distribution with a Normal model, using the same mean and standard deviation.

48.42≈npq

05.0)65.1(48.4219201850)1850( ≈−<≈⎟

⎠⎞

⎜⎝⎛ −

<=< zPzPXP

There seems to be about a 5% chance that this Red Cross chapter will run short of O-negative blood.

Can we always use a Normal model to make estimates of Binomial probabilities ? NO ! We only use a Normal model only for a large enough number of trials. And what we mean by “large enough” depends on the probability of success.

We need a large sample if the probability of success is very low (or very high).

-S. Halim -


The success / failure condition

A Binomial model is approximately Normal if we expected at least 10 successes and 10 failures. np ≥ 10 and nq ≥ 10

Math Box A Normal model extends infinitely in both directions. But a Binomial model must have between 0 and n successes, so if we use a Normal to approximate a Binomial, we have to cut off its tails.

np > 9Since q ≤ 1, we can requirenp > 9 qNow simplifyn2p2 > 9 npqSquaring yieldsnp > 3 For a Binomial that’s :μ > 3σOr, in other wordsμ - 3σ > 0We require:

npq

-S. Halim -


What Can Go Wrong ?

• Be sure you have Bernoulli trials. Be sure to check the requirements first : two outcomes per trial, a constant probability of success, and independence. Remember the 10% condition provides a reasonable substitute for independence.

• Don’t confuse Geometric and Binomial models. Both involve Bernoulli trials, but the issues are different. If you are repeating trials until your first success, that’s a Geometric probability. If you are counting the number of successes in a specific number of trials, that’s a Binomial probability.

• Don’t use the Normal approximation with small n. To use a Normal approximation in place of a Binomial model, there must be at least 10 expected success and 10 expected failures. For a large n when np is small, consider using a Poisson Model instead.

Documents

Randomness and Probability Part II